Introduction

Short tandem repeats (STRs) are DNA fragments with a variable number of tandemly repeated short (2–6 bp) sequence motifs, such as (GATA)n1. STR genotyping has now been applied in various aspects of human identification in forensic investigations for nearly 30 years2,3. Recently, to minimize adventitious matches and to facilitate data sharing, the Combined DNA Index System (CODIS) was upgraded from 13 loci to 20 loci (i.e. the expanded CODIS)4, while a parallel process occurred in China where 20 loci are required for uploading DNA profiles to the Chinese National Database (CND), the world’s biggest DNA databases5,6. However, 6 non-overlapped loci exist between the expanded CODIS and the CND6,7. The Huaxia Platinum System (Thermo Fisher Scientific, MA, USA) is a 25-locus, six-dye, multiplex that allows co-amplification and fluorescent detection of the 23 autosomal loci (D1S1656, D2S1338, D2S441, D3S1358, D5S818, D7S820, D8S1179, D10S1248, D12S391, D13S317, D16S539, D18S51, D19S433, D21S11, D22S1045, D6S1043, CSF1PO, FGA, TH01, TPOX, VWA, Penta D and Penta E) including all the recommended core loci in the expanded CODIS and the CND as well as Amelogenin and Y-InDel (rs2032678) for gender determination8. Our previous validation study demonstrated that this novel assay is robust, sensitive, specific and reliable, and forensic parameters is polymorphic and informative in three main ethnic groups of China, Sichuan Han, Xinjiang Uygur and Tibet Tibetan7.

China, the world’s most populous country and the world’s second-largest state by land area, emerged as one of the world’s earliest civilizations in the fertile basin of the Yellow River in the North China Plain. The nation officially recognizes 56 distinct ethnic groups, widely disseminated in 34 administrative regions, and the population substructure is further complicated9,10. The Han population, accounting for 92% of the total population in China, is widely distributed in mainland territory of China. The Yi people are a typical ethnic minority in China and the largest ethnic minority group in Sichuan Liangshan Yi Autonomous Prefecture. Most of them live in mountainous regions and often carve out their existence on the sides of steep mountain slopes. The Tibetan is one of the oldest peoples in China and South Asia and Tibetan population mainly reside throughout the Qinghai-Tibetan Plateau for hundreds of generations, and has genetic adaptations of distinct combinations of phenotype in high-altitude (>4000 m). With economic growth and traffic development, some Tibetans began to migrate to plain areas. Sichuan is home to a large community of Tibetans, with 30,000 permanent Tibetan residents and up to 200,000 Tibetan floating population.

In continuation to our previous studies7,11, the present study characterizes the genetic diversity of the Huaxia Platinum System in 3 main ethnic groups of China (193 Hainan Hans, 198 Sichuan Tibetans and 177 Sichuan Yis). Additionally, genetic data of our present investigated individuals and other previously studied 56 Chinese populations7,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44 were used to investigate genetic relationships along administrative and ethnic divisions.

Results and Discussion

Genetic parameters of the Huaxia Platinum System

Hainan, separated by the Qiongzhou Strait from the Leizhou Peninsula of Guangdong, is the smallest and southernmost province of China. The population density of Hainan is low compared to most Chinese coastal provinces. This study was carried out to provide the first batch of 23 STRs data of Hainan Han population (193 unrelated individuals) using the Huaxia Platinum System. Meanwhile, we continue to evaluate the forensic efficiency of this novel assay for application in 2 main ethnic groups in Sichuan. Sichuan consists of two geographically distinct parts: the eastern part is mostly within the fertile Sichuan basin and the western part consists of the numerous mountain range. Han Chinese, the majority of the province’s population, mainly reside in the eastern portion, while significant minorities of Yi and Tibetan people reside in the western portion that are impacted by inclement weather and natural disasters. Sichuan Han have been investigated in our previous studies11, in the present study Sichuan Yi (177 unrelated individuals) and Tibetan people (198 unrelated individuals) were analyzed.

No significant deviation from Hardy-Weinberg disequilibrium was observed (Table 1) and no significant deviations from linkage disequilibrium between pairwise STR loci after Bonferroni correction in 3 main ethnic groups (Supplementary Tables 13). The allele frequency distributions are listed in Supplementary Tables 46 and forensic parameters including observed heterozygosity (Ho), expected heterozygosity (He), power of discrimination (PD), power of exclusion (PE) and typical paternity index (TPI) for each locus are presented in Table 1.

Table 1 Forensic parameters for 23 autosomal STR loci of the Huaxia Platinum system in three Chinese ethnic groups.

A total of 246 alleles were observed in Hainan Hans with corresponding allele frequencies ranging from 0.0026 to 0.5207. The PD, PE, TPI, Ho, He varied from 0.8028 to 0.9840, 0.2290 to 0.8411, 1.0966 to 6.4333, 0.5412 to 0.9227 and 0.6233 to 0.9233, respectively. The combined power of discrimination (CPD), combined power of exclusion (CPE) are 0.9999999999999999999999999992 and 0.9999999999. For Sichuan Yi population, a total of 245 alleles were identified with corresponding allele frequencies varied from 0.0028 to 0.5057. The values of PD, PE, TPI, Ho, He spanned from 0.7927 to 0.9794, 0.3060 to 0.7475, 1.2899 to 4.0455, 0.6158 to 0.8757 and 0.6327 to 0.9111, respectively. The CPD, CPE are 0.999999999999999999999999998 and 0.999999998. For Sichuan Tibetan population, a total of 231 alleles were detected with corresponding allele frequency varied from 0.0025 to 0.5808. The PD, PE, TPI, Ho, He spanned from 0.7677 to 0.9841, 0.2227 to 0.7333, 1.0815 to 3.8269, 0.5404 to 0.8687 and 0.5878 to 0.9207, respectively. The values of CPD and CPE are 0.999999999999999999999999992 and 0.999999995. Meanwhile, the CPD and CPE based on the loci covered by the PowerPlex 21 System, GoldenTM DNA ID system 20 A kit and AGCU EX22 kit were estimated separately and listed in Supplementary Table 7. Obviously, the above-mentioned forensic parameters of these 23 loci demonstrated that the Huaxia Platinum multiplex system is a more informative and discriminative system as required for forensic DNA genotyping and databasing.

Population pairwise differences

The different nationalities of China were widely distributed in 34 administrative divisions, and the population substructure was complicated. In order to illuminate the genetic affinity among different nationalities, population comparisons were performed between our three studied groups and 56 previously investigated populations (40 Han populations, 15 ethnic minorities as well as Vietnamese of Yunnan). The Locus-by-Locus Fst and corresponding p values showed that statistically significant differences were observed between Hainan Han and Yunnan Miao at 11 loci, between Yunnan Vietnamese at 6 loci, and between Xinjiang Uyghur at 4 loci after Bonferroni adjustment (p < 0.00004) (Supplementary Table 8). As showed in Supplementary Tables 910, after Bonferroni correction (p < 0.00004), statistically significant genetic differentiations were found between Sichuan Yi and Yunnan Miao at 10 loci, between Yunnan Vietnamese at 7 loci, and between Inner Mongolia at 4 loci. Genetic differences were existed between Sichuan Tibetan and Yunnan Miao at 10 loci, between Yunnan Vietnamese at 9 loci, and between Taiwan Han at 3 loci.

Principal component analysis

Principal component analysis among the 59 groups was carried out been prepared on the basis of allele frequency distributions of 19 STR loci. As displayed in Fig. 1, the first principal component, the second principal component and the third principal component account for 30.63%, 15.50%, and 13.95% of the total variance, respectively, which can clearly differentiate the Tibetan, Uyghur, Kazakh, Miao and Vietnamese with other groups, but Han Chinese populations residing in different administrative regions were conglomerated together, which revealed that the genetic similarity was widely existed among Han Chinese populations distributed in different administrative divisions.

Figure 1
figure 1

Principal component analysis based on 19 overlapped STR loci of our studied populations (bold and red) and 46 reference Chinese Han populations.

Multidimensional scaling analysis

To further explore the genetic diversity and phylogenetic characteristic among Chinese populations, Nei’s standard genetic distances were calculated among 59 Chinese populations and presented in Supplementary Table 11. The largest genetic distances were observed between Miao and our three investigated groups: Hainan Han, Sichuan Yi, Sichuan Tibetan (Rst = 0.0940, Rst = 0.1103 and Rst = 0.1442, respectively), while Guangzhou Han is the most closely related to Hainan Han (Rst = 0.0107), Shanghai Han is the most closely related to Sichuan Yi (Rst = 0.0116), and Yunnan Bai is the closest to Sichuan Tibetan (Rst = 0.0199).

Furthermore, evolutionary relationships among 59 Chinese populations were inferred from MDS (Supplementary Fig. 1) on the basis of genetic distance matrix. As shown in Supplementary Fig. 1, 12 out of 59 populations (3 Xinjiang Uyghurs, Xinjiang Kazakh, Sichuan Tibetan, Tibet Tibetan, Yunnan Miao, Yunnan Vietnamese, Yunnan Dai, Yunnan Zhuang, Yunnan Yi, Yunnan Hani) were isolated and fall into the surrounding of MDS plots, and other populations were clustered together. For the sake of further ascertaining the genetic differentiation between our three investigated populations with the 40 reference Han populations, and with previously reported ethnic minorities, the other two MDS scatter diagrams were illustrated based on genetic distances values (Fig. 2).

Figure 2
figure 2

Multidimensional Scaling plots (MDS) constructed based on Nei’s genetic distances calculated by allele frequency distributions of 19 overlapped autosomal STRs. (A) MDS of our studied populations (bold and blue) and 40 reference Chinese Han populations (the information of abbreviations are presented in Supplementary Table 12). (B) MDS of our studied populations (bold and blue) and 16 reference ethnic minorities (the information of abbreviations are presented in Supplementary Table 12).

As shown in Fig. 2A, genetic divergence has existed between the two studied ethnic minorities Sichuan Yi, Sichuan Tibetan and Chinese Han populations distributed in different administrative regions obviously, and subtle differentiation was found between the Hainan Han and other Han Chinese populations. The findings were in line with the results of PCA. The visualization of Nei’s genetic distances values between our investigated nationalities and reference ethnic minority groups (Fig. 2B) demonstrated that there were significant differences among different minorities. Additionally, our research objects Sichuan Tibetan, Hainan Han clearly separated with other groups, the Sichuan Yi clustered with Yunan Bai.

Phylogenetic relationship analysis

To further explore the phylogenetic characteristics among Chinese populations, a phylogenetic tree was constructed using the neighbor-joining method. In the dendrogram (Fig. 3), two main clusters were clearly identified: one consisted of Yunnan Miao and Yunnan Vietnamese, and the other comprised 57 populations clustered together. Our investigated Hainan Han grouped with geographically ethnically close population Taiwan Han, Sichuan Tibetan first clustered with Tibet Tibetan and then clustered with Sichuan Yi. The phylogenetic structure revealed by Nei’s genetic distance matrix was in conformity with the characteristics revealed by PCA and MDS, which also in line with the results obtained in our previous researches based on Y-Chromosomal and X-Chromosomal genetic markers45,46,47.

Figure 3
figure 3

Phylogenetic tree among three studied populations (red and bold) and 56 reference populations. Phylogenetic tree was constructed by the Neighbor-Joining method based on 19 overlapped STR loci in the Mega 7.0 software.

To make a comprehensive population comparison based on autosomal genetic markers, we investigated the genetic variations in 568 unrelated individuals by using 23 autosomal STR loci and explored genetic relationships among 59 Chinese groups distributed in different administrative regions. Genetic data presented here provide basic information on the ethnic and geographical population differentiation required by the forensic genetics. Genetic differences within ethnicities were usually of minor magnitude, especially the Han nationality, which, despite their large sample size, showed a prominent genetic homogeneity. In this study, a genetic distinction between Northern and Southern Han48 or a North-South gradient genetic difference10 based on Y chromosomal genetic data has not been observed. It’s explicable that Y chromosome has the features of without recombination between loci and isolation-by-distance model, and in modern times, huge migrations of the Han might have further contributed to a homogeneity of the genetic landscape of China. We observed substantial genetic divergences among some ethnic groups, most notably Tibetans, Uyghurs, Kazakh, Miao, Vietnamese and Dai, and most other ethnicities. It’s explainable by different ancestries as well as special geographical and cultural background. Moreover, in order to provide more information for population genetics studies, more in-depth statistical analysis of our genetic data and larger sample sizes of some ethnicities are needed in future studies.

Conclusions

In summary, we provided the first batch of genetic polymorphism data of Hainan Han, Sichuan Yi and Sichuan Tibetan using 23 autosomal STR loci included in the Huaxia Platinum System. The results of forensic characteristics demonstrated that this new 25-plex multiplex system is highly polymorphic and informative in the studied populations and can be employed as a powerful tool for forensic applications. The inter-population comparisons, PCA, MDS and phylogenetic analysis manifested that no significant genetic distinction was found between northern and southern Han Chinese populations, but subtle divergence was observed between Hainan Han and other Han populations. And the inter-population comparisons, PCA, MDS and phylogenetic analysis consistently demonstrated that significant genetic differentiation was existed between minority ethnic groups (particularly in Sichuan Tibetan, Tibet Tibetan, Xinjiang Uyghur, Xinjiang Kazakh, Yunnan Miao, Yunnan Vietnamese, Yunnan Zhuang and Yunnan Dai) and Han populations. The results of genetic population substructure pattern can happen for the reasons of large-scale population migration, ethnic intermarriage, random mating and gene flow among different ethnicities or one nationality from distinct geographic regions.

Methods

Ethics Statement

Human blood samples were collected upon approval of the Ethics Committee at the Institute of Forensic Medicine, Sichuan University. Written informed consent was obtained from each participant. All the methods were carried out in accordance with the approved guidelines of Institute of Forensic Medicine, Sichuan University. This study was approved by the Ethics Committee of Sichuan University (Approval Number: K2015008).

Sample preparation

568 peripheral blood samples were collected from 193 unrelated Han Chinese recruited from Hainan Province, 177 unrelated Yi Chinese recruited from Sichuan Liangshan Yi Autonomous Prefecture and 198 Tibetan Chinese recruited from Sichuan Province.

Human genomic DNA was extracted using the Purelink Genomic DNA Mini Kit (Thermo Fisher Scientific) according to the manufacturer’s instructions. The quantity of the DNA template was determined using Quantifiler Human DNA Quantification Kit (Thermo Fisher Scientific) on a 7500 Real-time PCR System (Thermo Fisher Scientific). DNA samples were then normalized to 1.0 ng/μl and stored at −20 °C until amplification.

Amplification and genotyping

PCR amplification was performed with 27 PCR cycles in a ProFlex PCR System (Thermo Fisher Scientific) following the manufacturer’s protocol. Amplification products were separated and detected on an Applied Biosystems 3500 Genetic Analyzers using POP-4 polymer and 36 cm capillary array according to the manufacturer’s recommendations. Allele allocation was carried out with GeneMapper ID-X v.1.4 software (Thermo Fisher Scientific) using the allelic ladder and the set of bins and panels provided by the manufacturer.

Population studies

To evaluate the forensic efficiency of this novel STR system for application in 3 main ethnic groups of China, genotype data of 568 unrelated individuals including 193 Han, 177 Yi and 198 Tibetan were analyzed. Population indices including allele frequency, heterozygosity, Hardy–Weinberg equilibrium (HWE) and the possible presence of linkage disequilibrium (LD) among loci pairs were obtained using Arlequin software v3.5.2.249. Forensic parameters were estimated by calculating power of discrimination (PD), power of exclusion (PE) and typical paternity index (TPI) using modified PowerStats V12 spreadsheet (Promega)50.

Furthermore, to further investigate the phylogenetic relationships among Chinese populations, a comprehensive population comparison among 59 groups7,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44 was conducted using Locus-by-Locus comparisons (Fst) based on 19 overlapping STR loci (D2S1338, D3S1358, D5S818, D6S1043, D7S820, D8S1179, D12S391, D13S317, D16S539, D18S51, D19S433, D21S11, CSF1PO, FGA, Penta D, Penta E, TH01, TPOX, VWA) following Slatkin’s linearized Fst51. The pairwise Fst’s can be used as short-term genetic distances between populations, with the application of a slight transformation to linearize the distance with population divergence time. The detailed information and abbreviations of aforementioned populations are shown in Supplementary Table 12. The Principal component analysis scatter plot was depicted by MVSP v3.22 software52 and multidimensional scaling analysis (MDS) was conducted in SPSS software (IBM SPSS, version 19.0, Chicago). Unbiased estimate of Nei’s standard pairwise genetic distance was calculated using the Phylip3.695 package. A neighbor-joining phylogenetic tree was delineated in the Molecular Evolutionary Genetics Analysis 7.0 (MEGA 7.0) software53.

Quality control

Control DNA 007 (Thermo Fisher Scientific) and ddH2O were used as positive and negative controls respectively for each batch of amplification and genotyping. All experiments were conducted at the Forensic Genetics Laboratory of Institute of Forensic Medicine, Sichuan University, which is an accredited laboratory (ISO 17025), in accordance with quality control measures. Additionally, the laboratory has been accredited by the China National Accreditation Service for Conformity Assessment (CNAS).