Introduction

Short tandem repeats (STRs) are relatively mature genetic markers that were widely used in forensic applications. Genotyping of X-chromosomal STR (X-STR) markers has developed into a valuable method used by many forensic researchers as a supplement to the information provided by autosomal STR (AS-STR), Y-chromosomal STR (Y-STR), and mitochondrial DNA, particularly in deficiency and complex kinship cases1,2. These different approaches can complement each other with their characteristic genetic information. For instance, the ancestral information of populations deduced from genetic analyses could be quite different depending on the type of genetic marker used3. Meanwhile, in the identification of some certain kind of mixed stains, X-STRs can be used to exclude male individuals from the same paternal line. Moreover, because of the unique genetic inheritance of the X chromosome, the use of X-STRs is of great benefit in the identification of victims of war or mass disaster4.

There are several X-STR multiplexes being applied to forensic cases and genetics research. Among them, the Investigator Argus X-12 kit (Qiagen, Hilden, Germany), which includes 12 X-chromosomal STR markers as well as the amelogenin locus, has been used worldwide in a number of population sets5,6,7,8,9. To our knowledge, five Chinese Han subpopulations10,11,12,13,14 and eight ethnic minority groups from China15,16,17,18,19 have been previously studied using this kit. There are 56 ethnic groups in China, which has the largest population and third largest territory in the world. Each ethnic group has different geographical distributions, religions, customs, etc. Throughout history, some of these groups have been relatively isolated from each other, which may have resulted in relatively stable genetic differences between populations in defined areas. Therefore, for a better forensic application of the 12 X-STRs, it is important to understand the similarities or differences in genetic distance between different Chinese populations.

The purpose of this study was to investigate the frequency distribution data and forensic parameters of 12 X-STR loci in Mongolian and Eastern Chinese Han unrelated individuals as well as to illustrate the X-chromosomal evolutionary relationships between different Chinese populations and in comparison to other global populations. Additionally, this study enriched the X-STR database in order to highlight its significance in forensic identification and kinship analysis in different ethnic populations of China.

Materials and Methods

Population samples and DNA extraction

More than 70% of China’s Mongolian population lives in Inner Mongolia, which is located on the Eurasian continent20. Eastern China also has a large Mongolian population and accounts for 29.3% of the total population of China. In our study, samples from 541 healthy unrelated individuals (116 males and 116 females for the Mongolian samples, 200 males and 109 females for the Eastern Chinese Han samples) were collected. Blood samples were obtained from volunteer donors had information available under informed consent, following the protocols approved by the ethics committee at the Academy of Forensic Science, Ministry of Justice, P.R. China. DNA isolation was carried out by a Chelex-100 extraction protocol21. All the methods were carried out in accordance with the approved guidelines of the Academy of Forensic Sciences, Ministry of Justice, P.R. China.

PCR amplification and capillary electrophoresis

All DNA samples and positive control samples (9947 A and XX28) were amplified with the Investigator Argus X-12 kit (Qiagen, Hilden, Germany) on a GeneAmp PCR System 9700 Thermal Cycler (Thermo Fisher Scientific, MA, USA), according to the manufacturer’s protocol. PCR products were detected by capillary electrophoresis on an ABI PRISM 3130XL Genetic Analyzer (Thermo Fisher Scientific, MA, USA) with the protocol guide for genotyping and the allele designation was analyzed using the Genemapper ID v3.2.1 software (Thermo Fisher Scientific, MA, USA).

Statistical analysis and population comparisons

Allelic frequencies of the 12 X-STRs were calculated with CERVUS software/PowerStats v1.2.xls22 for female samples and were counted for male samples. Arlequin v3.5.223 was used to calculate haplotype frequencies of the four linkage groups in male samples; Hardy-Weinberg equilibrium (HWE) in female samples; the exact test of population differentiation based on allelic frequencies of males and females; linkage disequilibrium (LD) in males and females; and pairwise Fst genetic distances per locus among Mongolian, Eastern Chinese Han and nine other published populations from China10,11,12,13,14,16,17,18,19. Forensic parameters for 12 X-STRs including polymorphism information content (PIC), homozygosity (HOM), heterozygosity (HET), power of exclusion (PE), paternity index (PI), paternity exclusion chance in duos and trios (MECD and MECT)24, as well as power of discrimination for males (PDM) and females (PDF) were calculated with pooled allelic frequencies of the two populations using the online tool of the Forensic ChrX Research database (http://www.chrx-str.org).

China map was generated by and Package (ggplot2) and (maptools) of R: A Language and Environment for Statistical (https://www.R-project.org). The evolutionary history of eleven Chinese populations was inferred using the Unweighted Pair Group Method with Arithmetic mean (UPGMA) method25, and the optimal tree with the sum of branch length = 0.00821881 was shown (next to the branches). The evolutionary distances were used for phylogenetic analyses conducted in MEGA7 software26. Detailed population genetic structure analysis was performed with STRUCTURE v2.3.427,28 under the condition of the admixture model and correlated allelic frequencies between populations. We analyzed the structure of Mongolian/Eastern Chinese Han populations and nine other populations from previous studies based on the same 12 X-STRs. The analysis results were uploaded to the Structure Harvester29 to detect the true K value of the data. Further comparisons among different male populations based on the Nei’s unbiased genetic distance30,31 were calculated and Principal Coordinate Analysis (PCoA, known previously as PCA)32 was performed using GENALEX 6.5 software.

Data Availability

All data generated during this study are included in this published article (and its Supplementary Information files) and from the corresponding author on reasonable request.

Results and Discussion

HWE and LD test within populations

The results of HWE and LD tests in Eastern Chinese Han were presented in our previous report33. In the Mongolian samples, a total of 164 alleles were typed using the Argus X-12 kit and no statistically significant deviation from HWE was observed except for the DXS10134 locus in female samples (p = 0.0027) even after applying the Bonferroni’s correction for multiple testing (p = 0.05/12). This deviation from HWE may be caused by the substructure of the chosen population, which could result in a bias of the samples. In addition, no statistically significant LD was found in Mongolian females after Bonferroni’s correction (p = 0.05/66), which was different from the Eastern Chinese Han analysis. This might be due to the rigorous statistical methods. On the other hand, statistically significant LD was found between DXS10101 and DXS10103 loci in Mongolian males (p < 0.0001), which has also been reported in other populations5,34,35.

Haplotype analysis in males

The 12 X-STR markers are clustered into 4 linkage groups based on their physical localizations: LG1 (Xp22) DXS8378-DXS10135-DXS10148, LG2 (Xq11) DXS7132-DXS10074-DXS10079, LG3 (Xq26) DXS10101-DXS10103-HPRTB, LG4 (Xq28) DXS7423-DXS10134-DXS10146, and thus each cluster of 3 markers is considered as one haplotype for the genotyping of males. The numbers of observed haplotypes in Mongolian males were 99, 71, 76 and 81 for LG1, LG2, LG3 and LG4, respectively (Supplementary Table S1); while in Eastern Chinese Han males the numbers were 159, 94, 101 and 114 for LG1, LG2, LG3 and LG4, respectively (Supplementary Table S2). Haplotype diversity (HD) of the four LGs varied from 0.9901 (in LG2) to 0.9972 (in LG1) in Mongolian males and varied from 0.9876 (in LG3) to 0.9973 (in LG1) in Eastern Chinese Han males. This illustrated that LG1 was the most polymorphic group with a frequency of 0.0259 for the most common haplotype in Mongolian males, 10-21-24.1 and 10-21-25.1 (for DXS8378-DXS10135-DXS10148), and a frequency of 0.0200 for the most common haplotype in Eastern Chinese Han males, 10-25-27.1. Haplotype 12-16-32 (for HPRTB-DXS10103-DXS10101) was the most common, observed 12 times in LG3 of the Eastern Chinese Han samples with a population frequency of 0.0600. The results showed that the four closely linked X-STR groups are highly informative genetic markers in the two studied populations.

Genetic variability

The allelic frequency and forensic parameters of Eastern Chinese Han were also presented in our previous report33. Based on the calculated allelic frequencies of 12 X-STRs of Mongolian females and males, no significant difference was found between them (p > 0.05) by the Exact Test; therefore, males and females were pooled for calculating forensic parameters. The allelic frequencies of 12 X-STRs are shown in Supplementary Table S3. The results indicated that all 12 X-STR markers were polymorphic. The DXS10135 locus showed the highest forensic efficiency in both Mongolian (PIC = 0.9070) and Eastern Chinese Han populations, while DXS7423 was the least informative in both Mongolian (PIC = 0.4597) and Eastern Chinese Han populations. HET values varied from 0.5268 (DXS7423) to 0.9134 (DXS10135) in the Mongolian population. All the forensic parameters of 12 X-STRs for the Mongolian population are shown in Table 1. In total, the combined PDF, PDM, MECD and MECT were calculated as exceeding 0.9999999999, 0.9999999983, 0.9999987070 and 0.9999999931, respectively. All these forensic parameters demonstrated that the 12 X-STR markers were highly polymorphic and will be useful in forensic applications or anthropological research among the Mongolian population of Inner Mongolia and the Eastern Chinese Han population.

Table 1 Forensic parameters calculated with pooled allele frequencies of the Mongolian population.

Inter-population comparison

For a better understanding of the genetic structures of these 12 X-STRs in Chinese populations, we assembled data from five Chinese Han subpopulations and four ethnic minority groups from different areas of China (Fig. 1a, Supplementary Table S4). Detailed genotyping data and allelic frequencies of these populations assayed by Investigator Argus X-12 kit were available in previously published reports. By calculation and comparison, we noticed that all the combined PDF of the 11 populations were greater than 0.999999999, except for the Guangdong Yao ethnic group (cPDF = 0.999999998); the combined PDM ranged from 0.999999997 to 0.999999999; and the combined MECD and MECT ranged from 0.999998065 to 0.999999993 and 0.999998732 to 0.999999998, respectively. The abovementioned forensic parameters perfectly illustrated that the 12 X-STRs had great application value in forensic identification and kinship analysis among Chinese populations.

Figure 1
figure 1

(a) Geographical distribution of 11 populations on the map of China, which was generated by Package (ggplot2) and (maptools) of R: A Language and Environment for Statistical (https://www.R-project.org). The areas marked by asterisks show the populations in our study. (b) Evolutionary relationships of 11 populations in different areas of China shown by a phylogenetic tree that was constructed based on 12 X-STRs.

Next, the allelic frequencies of Mongolian and Eastern Chinese Han populations were compared with nine other Chinese populations. Since the Fst value is usually used for analyzing genetic distances between populations36,37, we calculated the pairwise Fst genetic distances (shown in Supplementary Table S5) among the 11 Chinese populations to study their phylogenetic relationships with each other. Based on these data, inter-population comparisons were performed: Significant differences were observed between Mongolian and Taiwanese populations at 4 loci; Mongolian and Guangdong Yao populations at 3 loci; Mongolian and Eastern Chinese Han populations at 2 loci; and for 1 locus between Mongolian and Han populations of Liaoning and Henan, Mongolian and Liaoning Korean populations, and Mongolian and Guangdong Zhuang populations. No significant difference was observed between Mongolian and Hebei Han, Guangdong Han, or Fujian She populations. The maximum Fst value was 0.0055 for the Guangdong Yao ethnic group and Taiwanese population.

Phylogenetic tree by UPGMA phylograms

Despite the limited number of loci that can provide strong information about the general level of genetic diversity, the phylogenetic analyses indicated that populations that are geographically close tend to have shorter genetic distances compared to those that are geographically far apart, which is in accordance with the conventional knowledge of population genetics38. The visualized results of the observed genetic distances revealed that the Mongolian population in this study was genetically close to the populations from northern China, including the northern Han population and Korean ethnic population in the Liaoning province. On the other hand, the eastern/southern populations of China were not as related to the Mongolian population based on X-STRs analyses, regardless of ethnic identity. Of all the populations presented, the Guangdong Yao ethnic group was relatively far from the other populations according to the phylogenetic tree we constructed (shown in Fig. 1b). These observed results were consistent with the cultural and geographical background and development of the abovementioned populations.

Clustering by STRUCTURE analysis

To further investigate the population genetic structure at an individual level on a worldwide scale, the STRUCTURE program was used to process the genetic data of five representative Chinese populations (Mongolian, Han from Eastern China, Han from Liaoning province, Han from Hebei province, and Korean from Jilin province)10,11,16,33, as well as six extra published populations from European and Asia39,40,41,42,43,44. The range of possible Ks was tested from 2 to 11. According to the Evanno method45, when the real K is indeterminate under the interference of Ks plateaus in the distribution of L (K), it often locates at the modal value of the distribution of ΔK; therefore, the most appropriate K was 2 (Supplementary Figure S1). The clustering of the populations presented an easily distinguishable geographic pattern (Fig. 2). Mongolian, Eastern Chinese Han and the other Chinese populations belonged to cluster 1, while the other six global populations belonged to cluster 2. When K = 3–11, there were no recognizable boundaries within the Chinese populations by STRUCTURE analysis. The same situation was also observed in cluster 2 (the Emirati, Belarusian, Hungarian, Germany, Swedish and Italian populations): the five geographically close Western European populations and the Emirati from United Arab Emirates (UAE), who live in the tri-continental crossroads connecting Africa, Europe and Asia, shared a relatively consistent genetic structure pattern no matter how the K value was assumed (Supplementary Figure S2). These results conformed to the findings of Brissenden et al. based on an autosomal SNP study46 and indicated that the Mongolian and Eastern Chinese Han were inseparable from other Chinese at the genetic level from STRUCTURE analysis.

Figure 2
figure 2

Estimated population genetic structure of eleven different populations at K = 2.

Principal Coordinate Analysis (PCoA)

The genetic pattern of the male X chromosome inherited from the mother, normally referred to as the haplotype data set, makes the genetic distribution of X-STRs in males worth exploring. PCoA provides a way of visualizing the essential patterns of genetic relationships contained in a special matrix (e.g., distance matrix) and allows us to find and plot the major patterns within a multivariate data set. PCoA gives a reasonable assumption at a higher taxonomic level, compared to the algorithms that always assume a hierarchical genetic structure in Tree building methods. In this respect, PCoA is an effective complement to the evolutionary tree. Therefore, PCoA was carried out to test for significant variation in the genetic distribution of 12 X-STR markers among the males of abovementioned eleven populations in the STRUCTURE analyses. As shown in Fig. 3, the first two principal components explained 94.62% of the total variance observed within these populations (the first and second component accounted for 88.02% and 6.60%, respectively). In the PCoA diagram, Mongolian, Eastern Chinese Han and other Chinese populations were clustered together on the right, while other global populations were clustered on the left. Among the European male populations, Hungarian, Swedish and Italian were located in the upper left quadrant, while Germany and Belarusian were clustered together in the lower left quadrant, which was in accordance with the national stratification within Europe in other reports41,43. Meanwhile, the Emirati males from the Arabian Peninsula of Western Asia also located in the lower left quadrant and could not be distinguished from the Europeans by the 12 X-STRs, which was consistent with the result of the STRUCTURE analysis in our study. This may due to the complex biogeography and history background of UAE. Among the cluster of Chinese populations, Hebei Han, Liaoning Han and Korean from Jilin shared a closer genetic distance, which supported the results observed from the phylogenetic tree. Additionally, the Mongolian population was located in the lower right corner, slightly far from other Chinese groups in terms of genetic relationship. Similar results were found in a study of Mongolian Y-STRs, and this probably reflects customs and territory restriction to some extent47. Our analysis indicated that the 12 X-STR loci had some advantages in distinguishing Mongolian males from others by the PCoA method.

Figure 3
figure 3

Principal Coordinate Analysis (PCoA) plot displaying the genetic relationships among five Chinese male populations and six male populations from Europe and Asia based on Nei’s unbiased genetic distance matrix of 12 X-STRs.

The Mongolian is a traditional nomadic nationality mainly distributed in Eastern Asia, it is one of the largest minority nationalities in China, and also the main nation of Mongolia. In addition, the Mongolians are also distributed in Asian and European countries such as Russia. The Mongolian nationality first originated from the eastern bank of the ancient Wangjian River. In the early thirteenth century, the Mongolian ministry, headed by Genghis Khan, unified the Mongolia region, and gradually formed a new national community. In the history of China and even in the world, the Mongolian people have had a profound influence and significant role in the economy, politics and culture. Besides, Eastern China is a densely populated area, the population there accounts for 29.3% of the total population of China, which has the most common characteristics of Chinese. Therefore, they are both very important and particular population groups in China, that’s why our study focused on them. Its particular genetic inheritance that makes the X chromosome valuable for human evolutionary and historical research48,49. On the one hand, there is a single copy of the X chromosome in males, it is possible to detect their haplotypes which are needed to infer the phylogeny of a region. The X/Y chromosome and mitochondrion are candidates for researches that use phylogenies, an advantage that explicates their dominant role in human historical studies50,51,52. On the other hand, recombination on women X chromosome also creates vast quantities of information that is vital for providing a complete view of the history of human populations. Although the bulk of human historical researches have been done with Y chromosome or mitochondrion, X chromosome has the advantage of existing in both genders and much lower rates of genetic drift50. The X-STR markers included in Argus X-12 kit are the most developed and widely used, there have been a great amount of 12 X-STRs data of different populations, by which we can conduct the analysis of population characteristics or geographical studies. Therefore, we conducted genetic analysis of Mongolian and Eastern Chinese Han populations based on 12 X-STR markers and achieved significant and prominent results that might be used in more precise substructure detection of Eastern Asian population and the estimation of individual biogeographical ancestry in further study.

Conclusions

Our current study indicated that the Investigator Argus X-12 kit used in Mongolian and Eastern Chinese Han sample sets provided highly polymorphic data for discriminating individuals and testing kinship, which enriched the Chinese ethnic genetic information. The significant deviations from HWE observed at the DXS10134 locus in the Mongolian population could possibly be eliminated by increasing the sample size from geographically closed regions in Inner Mongolia. The genetic comparison among eleven Chinese populations demonstrated that the statistically significant genetic distances were consistent with the known situation of these national mixed, small-settled and staggered populations living in China. In addition, neighboring populations and different ethnic groups within the same area appeared to have closer evolutionary relationships in the phylogenetic tree. The STRUCTURE analyses revealed that the Mongolian and Eastern Chinese Han populations belonged to the same cluster as the other Chinese populations, but were apparently distinct from other global populations. The PCoA analysis indicated that the 12 X-STRs could not only be applied in distinguishing Chinese males and the males from the abovementioned countries, but also able to show a moderate genetic distance between Mongolian and the other Chinese male populations. To our best knowledge, this is the first report to demonstrate that the 12 X-STR markers could be useful in dealing with the genetic relationship between Mongolian males and others.

However, for some populations, such as the Manchu and Miao with their large populations, or the Kazak with significant differences from other Chinese populations in other genetic markers, there are still no published data concerning the 12 X-STRs panel. Therefore, further studies should be focus on increasing the sample size of current ethnic groups and including more nationalities. Hopefully, our endeavor will help to establish an appropriate and integrated X-STR database that is suitable for the large population of China.