Genetic diversity and phylogenetic characteristics of Chinese Tibetan and Yi minority ethnic groups revealed by non-CODIS STR markers

Non-CODIS STRs, with high polymorphism and allele frequency difference among ethnically and geographically different populations, play a crucial role in population genetics, molecular anthropology, and human forensics. In this work, 332 unrelated individuals from Sichuan Province (237 Tibetan individuals and 95 Yi individuals) are firstly genotyped with 21 non-CODIS autosomal STRs, and phylogenetic relationships with 26 previously investigated populations (9,444 individuals) are subsequently explored. In the Sichuan Tibetan and Yi, the combined power of discrimination (CPD) values are 0.9999999999999999999 and 0.9999999999999999993, and the combined power of exclusion (CPE) values are 0. 999997 and 0.999999, respectively. Analysis of molecular variance (AMOVA), principal component analysis (PCA), multidimensional scaling plots (MDS) and phylogenetic analysis demonstrated that Sichuan Tibetan has a close genetic relationship with Tibet Tibetan, and Sichuan Yi has a genetic affinity with Yunnan Bai group. Furthermore, significant genetic differences have widely existed between Chinese minorities (most prominently for Tibetan and Kazakh) and Han groups, but no population stratifications rather a homogenous group among Han populations distributed in Northern and Southern China are observed. Aforementioned results suggested that these 21 STRs are highly polymorphic and informative in the Sichuan Tibetan and Yi, which are suitable for population genetics and forensic applications.


Results
Genetic parameters of the 21 non-CODIS STRs. Population genetic structures in East Asia are complex, especially in Chinese population stratification consisted of 56 Chinese officially recognized ethnic groups widely distributed in 34 administrative divisions. Different ethnic origin has been believed to have their special ethnic origin or common ancestors. Clearly understanding the genetic variation and forensic characteristics of different ethnic populations is indispensable in the forensic applications, especially in the paternity testing and individual identification. In the present study, a total of 332 unrelated individuals residing in Sichuan Province are genotyped using a multiplex assay amplifying 21 non-CODIS autosomal STR loci (AGCU 21 + 1 System). The detailed genotypes of 237 Tibetan individuals and 95 Yi individuals are presented respectively in Supplementary Tables S1 and S2. Forensic parameters including observed heterozygosity (Ho), expected heterozygosity (He), polymorphism information content (PIC), power of discrimination (PD), power of exclusion (PE) and typical paternity index (TPI) for each locus in two ethnic groups are shown in Table 1. No significant deviations from Hardy-Weinberg equilibrium (HWE) are observed for any of the 21 non-CODIS STRs or in two ethnic groups after Bonferroni correction (p > 0.0024). No significant deviations from linkage disequilibrium between pairwise STR loci (378 pairwise groups) are observed after Bonferroni correction (p > 0.0002) with the exception of pairwise groups between D11S4463 and D5S2500 in Tibetan population, as well as D10S1248 and D6S1017 in Yi population (Supplementary Tables S3 and S4). In Sichuan Tibetan population, a total of 183 alleles are identified with corresponding frequencies vary from 0.0021 to 0.5401 (Supplementary Table S5). D19S433 is detected with the 15 alleles at the maximum, while D1S1627 is only detected with 6 alleles (Supplementary Figure S1). The TPI spans from 1.2474 at locus of D1S1627 to 2.6333 at locus of D19S433. The observed heterozygosity ranges from 0.5992 (D1S1627) to 0. 8101 (D19S433) with an average of 0.7258. The first three loci with highest PD are D19S433, D2S1776, and D11S4463, and the combined power of discrimination (CPD) value is 0.9999999999999999999. The highest and lowest PE loci are D19S433 (0.6180) and D1S1627 (0.2899), respectively, and the combined power of exclusion (CPE) value is 0. 999997.

Populations Tibetan Yi
In Sichuan Yi population, a total of 149 alleles are observed with corresponding allele frequencies span from 0.0053 to 0.5053 (Supplementary Table S6). As shown in Supplementary Figure     Multidimensional scaling analysis. Subsequently, to evaluate the proportion of genetic heterogeneity and homogeneity attributable to Chinese population stratification, we calculated the Nei's genetic distances for total 378 pairwise groups (Supplementary Table S9 and Fig. 2). The largest Nei's standard genetic distance is observed between the Henan Han1 and Fujian She, and the relatively small genetic distances are identified between the Han populations distributed in different administrative divisions (0.0013 for Shandong Han and Henan2, 0.0018 for Beijing Han vs. Shandong Han, and Beijing Han vs. Henan Han2). For our studied two populations, Sichuan Yi has a relative far genetic relationship with Fujian She (0.0552) and a close genetically relationship with Shandong Han (0.0216) with a mean of 0.0316 ± 0.0095. In addition, Sichuan Tibetan has a far relationship with Fujian She (0.0577) and a close relationship with Lhasa Tibetan (0.0209) and Shandong Han (0.0208) with a mean of 0.0322 ± 0.0105. We next constructed a multidimensional scaling plots based on Nei's genetic matrix to explore the population genetic structure in our 9,776 individuals. As shown in Fig. 3, all Han Chinese populations are tightly grouped together and located at the center of MDS plots, with the exception of one Han population sampled from Henan Province. We can also find that several minority groups, including Yunnan Bai, Xinjiang Xibe, Hubei Tujia and Aksu Uyghur, are intermingled with aforementioned Han Chinese groups.

Discussion
China is currently populated by over 1.3 billion people who belong to at least 56 Chinese officially recognized linguistically and ethnically different groups. Genetic studies of Chinese populations from different minority ethnic groups are of great interest due to China's complex demographics, large population size and complex geographical characteristics. Additionally, clearly identifying and detecting the modern human evolution, origin and demographic history have been a resurgence of interest in population geneticists and medical geneticists due to successfully analyze the ancient nuclear and mitochondrial DNA sequences of Neanderthal and Denisovan [38][39][40] and corresponding affection of anatomically modern human or present-day population disease susceptibility 41 .  Previous anthropological and genetic studies have provided evidences that the peopling of China is characterized by different ancestral ethnicity origin by maternal lineages (mitochondrial genome) and paternal genetic signatures (Y-Chromosome) [42][43][44][45][46][47][48] . Despite these large-scale efforts in investigating patterns of natural selection, estimating individual ancestry and predicting the evolutionary history in Chinese populations based on SNPs, InDels, and CODIS STR loci [21][22][23]49,50 . The human genetic variation of non-CODIS STRs in the Sichuan Yi and Tibetan populations remains unexplored. In this study, a total of 237 Chinese Tibetan individuals and 95 Yi individuals from Southwest China are genotyped using AGCU 21 + 1 PCR amplification kit. In addition, the population differentiation analyses also included 9,444 individuals in 26 groups from 23 distinct administrative divisions or 14 ethnic groups that are genotyped using this same kit in the previous studies. The final data set is made up of 21 non-CODIS STR loci genotypes in 9,776 individuals from 28 Chinese populations. We have analyzed the genetic variation and population structure of the aforementioned populations via analysis of molecular variance, PCA, MDS and phylogenetic analyses.

Forensic features of non-CODIS STRs in Tibetan and Yi. Recently, large number of commercial kits
included the overlapped 13 CODIS STRs, such as GlobalFiler TM STR kit 51 (Thermo Fisher Scientific, Carlsbad, USA), Huaxia TM Platinum PCR Amplification kit 52 (Thermo Fisher Scientific, Carlsbad, USA), PowerPlex ® Fusion kit 53 (Promega, USA) and so on, are widely used in the forensic human identification, paternity testing, and DNA database construction in criminal investigations or missing persons cases. CODIS STRs amplification systems with the limitation of improving the forensic efficiency when used them as a complementary to each other in the complex forensic cases. However, simultaneously testing the 21 non-CODIS STRs included in the AGCU 21 + 1 kit can minimize adventitious matches, increase discrimination power and facilitate data sharing in the cases with mutation, degraded sample cases and deficiency cases of paternity testing. Several measures of genetic diversity (observed heterozygosity, expected heterozygosity) and forensic statistical indexes of 21 non-CODIS STRs (PD, PE, PIC and so on) are relatively high in Sichuan Yi and Sichuan Tibetan populations in the present study. Most previous genetic studies based on a set of SNPs or STRs located on sex Chromosome show similar results 18,19,54,55 . But some researchers illustrated the lack of enough combined power of discrimination (0.99999999995713) and power of exclusion (0.97746) in Yi ethnicity 56 . However, the CPEs in the new studied populations and previous studied Han group 24 are over 0.99999 and the CPDs are larger than 0.9999999999999999999. Our findings in these two investigated populations in combination with the previous studied Sichuan Han group 24 demonstrated that twenty-one non-CODIS STRs included in AGCU 21 + 1 PCR amplification kit are highly discriminative and informative in diverse ethnic populations residing in Sichuan Province, West China, and should be used as a complementary tool in complicated paternity cases (parentage relationship identification with mutation, historical human skeletal remains, missing persons investigation and disaster victim identification). Besides, it can also be integrated into the new panel examined using the massively parallel sequencing platform, such as Ion S5 XL and Illumina-Miseq sequencer. This study also provides the first batch of genetic diversity information of 21 non-CODIS STRs in two ethnic groups and enriches the Chinese non-CODIS STRs reference databases.
Inner and inter population structure construction. The results observed in this comprehensive population comparison reveal that significant genetic differences are identified between Han Chinese populations and some minority ethnic groups, especially predominantly in Tibetan and She populations. Besides, our analyses of phylogenetic relationship reconstruction and MDS indicated that Han Chinese population is homogenous based on autosomal genetic makers compared with sex-inherited genetic markers 57 . Among Han Chinese populations, no significant differences are observed in different populations defined by geographic boundary (Yangzi River), which is identified by the Y-STRs and high density SNPs panel 57 . However, a slight North-South gradient difference can be vaguely identified and not a significant North-South genetic distinction. The identified genetic similarities and differences among Chinese populations are a valuable technique for identifying accurate disease risk gene in genome-wide association study, avoiding a spurious association, and detecting more ethnicity-special ancestry informative markers in forensic ancestry inference.
Tibetan and Yi populations belong to the Tibeto-Burman-speaking subfamily in the Sino-Tibetan languages and the previously investigated Southwest Chinese Han population belongs to another subfamily (Chinese). Tibetan, as a most representative group, is genetically adapted to extreme hypoxia, and has been the genetic subject for multidisciplinary Studies. Our results reveal that two Tibetan populations distributed in different geographic positions (high altitude: Tibet, and low altitude: Sichuan) have a strong genetic affinity, however, keep a far genetic relationship with other populations. These features are consistent with previous findings revealed by genetic studies based on high-throughput genotyping data and genome sequence data 22,50,58 . Yi population, as we expected, keeps a relatively distant genetic relationship with the Tibetan population residing in Sichuan. Besides, these two Tibeto-Burman-speaking populations keep a relatively genetically distinct relationship with our investigated Sichuan Han population 24 although all three groups have a close geographical position, which is accordance with different ethnicity origin, cultural background. It is strange to find that Sichuan Yi and Yunnan Yi keep a distant genetic relationship which may be influenced by the culturally different of three subgroups of Yi (Ni, Lolo, and other). In the future, large-scale population genetic history studies from different administrative divisions based on different high-density genetic marker sets (even whole genome sequence of archaic or present-day human DNA) will be needed to investigate and elucidate the origin, migration of the Sino-Tibetan-speaking ethnicity groups.

Conclusions
In summary, we sampled 332 individuals from two minority ethnic groups to assess the genetic variations of 21 non-CODIS STR loci and combined these samples with 9,444 individuals previously investigated from 26 Chinese populations to explore Chinese population structures. Our results demonstrated that this panel of STRs is highly informative and polymorphic in the Sichuan Tibetan and Yi population, and can be widely used as a tool for personal identification and parentage testing in forensics. Additionally, the estimate of genetic differentiation (Fst and p values, and Nei's genetic distance) suggested that the Sichuan Tibetan population and Sichuan Yi keep the close relationship with Lhasa Tibetan and Mongolian population, respectively, but being relatively isolated from other ethnic groups, especially within Han Chinese populations. The results obtained from PCA, MDS and phylogenetic analyses also demonstrated that genetic differences among Han Chinese and minorities widely exist and Han Chinese populations are homogeneous in different geographical divisions.

Methods
Ethnics standard. This study was approved by the institutional review boards of Sichuan University. All participants signed informed consent statements prior to participation. Human blood samples were collected upon approval of the Ethics Committee at the Institute of Forensic Medicine, Sichuan University. All the experimental procedures and the methods for each procedure were carried out in accordance with the approved guidelines of the Institute of Forensic Medicine, Sichuan University. Sample preparation. Unrelated blood samples were collected from 237 unrelated Tibetan individuals (120 males and 117 females) recruited from Chengdu and 95 Yi individuals (55 males and 40 females) recruited from Liangshan Yi Autonomous Prefecture, Sichuan Province. All individuals had been required to be the indigenous inhabitants or with a recent ancestor residing in the corresponding sample collection region at least three generations.
Human genomic DNA was extracted using PureLink Genomic DNA Mini Kit (Thermo Fisher Scientific) according to the manufacturer's protocol. The quantity of the DNA template was determined using Quantifiler Human DNA Quantification Kit on a 7500 Real-time PCR System (Thermo Fisher Scientific). DNA samples were normalized to 1.0 ng/μL and stored at −20 °C until amplification.
PCR amplification and genotyping. PCR amplification was performed following the manufacturer's protocol on a ProFlex 96-well PCR System (Thermo Fisher Scientific). The PCR system was a 25 μL reaction volume containing 10 μL of Reaction Mix, 5 μL of Primers 21 + 1, 0.75 μL C-Taq and 1.0 ng of template DNA. The thermal cycling conditions consisted of an initial step at 95 °C for 2 min; followed by 30  Amplification products were separated and detected on the Applied Biosystems 3130 Genetic Analyzers following the manufacturer's recommendations. One microliter of PCR products or allelic ladder was added to a mixture containing 9.5 μL of deionized Hi-Di formamide and 0.5 of μL AGCU Marker Size-500 (AGCU ScienTech Incorporation). The mixture was injected at 1.2 kV for 16 s and electrophoresed at 13 kV for 1550 s with a run temperature at 60 °C. Allele allocation was carried out with GeneMapper ID 3.20 analysis software using the allelic ladder and the set of bins and panels provided by the kit.
Population studies. In order to evaluate the forensic efficiency of this non-CODIS STR panel for application in the Sichuan Tibetan and Yi, genotype data of 332 unrelated individuals were analyzed. The observed heterozygosity (Ho), expected heterozygosity (He), the exact test of Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were estimated and performed using Arlequin 3.5.2.2 59 . Allelic frequencies and forensic parameters, including the polymorphism information content (PIC), power of discrimination (PD), power of exclusion (PE) were calculated using the modified PowerStat V12 spreadsheet (Promega) 60 .
Additionally ,  Quality control. All experiments were conducted at the Forensic Genetics Laboratory of the Institute of Forensic Medicine, Sichuan University, which is an accredited laboratory (ISO 17025), and has been accredited by the China National Accreditation Service for Conformity Assessment (CNAS). We strictly followed the recommendations of Chinese National Standards and Scientific Working Group on DNA Analysis Methods (SWGDAM) 65 . Control DNA 9947A (AGCU ScienTech Incorporation) and ddH 2 O were used as positive and negative controls respectively for each batch of genotyping.