Genetic structure of Tibetan populations in Gansu revealed by forensic STR loci

The origin and diversification of Sino-Tibetan speaking populations have been long-standing hot debates. However, the limited genetic information of Tibetan populations keeps this topic far from clear. In the present study, we genotyped 15 forensic autosomal short tandem repeats (STRs) from 803 unrelated Tibetan individuals from Gansu Province (635 from Gannan and 168 from Tianzhu) in northwest China. We combined these data with published dataset to infer a detailed population affinities and genetic substructure of Sino-Tibetan populations. Our results revealed Tibetan populations in Gannan and Tianzhu are genetically very similar with Tibetans from other regions. The Tibetans in Tianzhu have received more genetic influence from surrounding lowland populations. The genetic structure of Sino-Tibetan populations was strongly correlated with linguistic affiliations. Although the among-population variances are relatively small, the genetic components for Tibetan, Lolo-Burmese, and Han Chinese were quite distinctive, especially for the Deng, Nu, and Derung of Lolo-Burmese. Han Chinese but not Tibetans are suggested to share substantial genetic component with southern natives, such as Tai-Kadai and Hmong-Mien speaking populations, and with other lowland East Asian populations, which implies there might be extensive gene flow between those lowland groups and Han Chinese after Han Chinese were separated from Tibetans. The dataset generated in present study is also valuable for forensic identification and paternity tests in China.

Scientific RepoRts | 7:41195 | DOI: 10.1038/srep41195 haplogroup A, D, G, and M8 [3][4][5] . The genetic relics of the Late Paleolithic ancestors of Tibeto-Burman populations have also been reported, such as haplogroup M62 5 . Y chromosome suggested Tibeto-Burman populations are an admixture of the northward migrations of East Asian initial settlers with haplogroup D-M175 in the Late Paleolithic age, and the southward Di-Qiang people with dominant haplogroup O3a2c1*-M134 and O3a2c1a-M117 in the Neolithic Age [6][7][8] . Haplogroup O3a2c1*-M134 and O3a2c1a-M117 are also characteristic lineages of Han Chinese, comprising 11.4% and 16.3%, respectively 9,10 . However, another dominant paternal lineage of Han Chinese, haplogroup O3a1c-002611, is found at very low frequencies in Tibeto-Burman populations, suggesting this lineage might not have participated in the formation of Tibeto-Burman populations 6,[9][10][11] . Sex-biased admixture has also been observed during the formation of Tibeto-Burman populations. Southern Tibeto-Burman populations exhibit a stronger influence of northern immigrants on the paternal lineages and a more extensive contribution of southern natives to the maternal lineages 12 . Likewise, the southern natives have made a greater contribution to the maternal lineages of southern Han Chinese 13 . Tibeto-Burman populations tend to cluster with North Asian and Tai-Kadai populations rather than Han Chinese based on the frequency data of 15 autosomal short tandem repeats (STRs) 14 . A genome-wide study of PanAsia SNP project reveals that Han Chinese populations show varying degrees of admixture between a northern Altaic cluster and a Sino-Tibetan/ Tai-Kadai cluster 15 . But Tibetan populations were not included in the PanAsia project. The analyses of more than 30 deeply sequenced genomes of Tibetans in Tibet Autonomous Region give consistent results with Y chromosomes that most of the Tibetan gene pool diverged from that of Han Chinese about 15 kya to 9 kya. The shared ancestry of Tibetan-enriched sequences dates back to 62-38 kya, representing Paleolithic colonization of the plateau 16 . An ancient DNA-based study using ancient Nepalese genomes of the Chokhopani, Mebrak and Samdzong sites spanning 3 to 1 kya demonstrates that the Tibetan Plateau experienced millennia of genetic continuity which continues until the present day 17 .
From previous studies, the origin of Sino-Tibetan populations seems to involve substantial genetic admixture with surrounding populations. However, the limited markers of mtDNA and Y chromosome and small sample sizes and insufficient sampling of genome-wide study are far from enough to give a comprehensive understanding about the genetic history and admixture process of Sino-Tibetan populations. In addition, Tibetan populations of Gansu province, the key area for the diversification of Amdo Tibetans, have seldom been studied genetically. Therefore, we analyze 15 autosomal STRs in 635 and 168 unrelated individuals from two Tibetan populations in Gannan and Tianzhu of Gansu province to explore the genetic structure of Tibetan populations in northwest China and to test population affinities and the level of admixture of Sino-Tibetan populations with surrounding populations.

Methods
We collected blood samples of 635 and 168 unrelated individuals from two Tibetan populations in Gannan and Tianzhu, Gansu province. Our study was approved by the Ethnic Committee of Gansu Institute of Political Science and Law. The study was conducted in accordance with the human and ethical research principles of Gansu Institute of Political Science and Law. All individuals were adequately informed and signed their informed content before their participation. For each sample, genomic DNA was extracted according to the Chelex-100 method and proteinase K protocol 18 . The 15 most widely used forensic loci were amplified simultaneously using AmpFlSTR Sinofiler PCR Amplification Kit (Applied Biosystems, Foster City, CA, USA) at the D8S1179, D21S11, D7S820, CSF1PO, D3S1358, D13S317, D16S539, D2S1338, D19S433, vWA, D18S51, D5S818, FGA, D6S1043 and D12S391 STR loci. The PCR products were analyzed with the 3500XL DNA Genetic Analyzer and Genemapper ID-X software (Applied Biosystems, Foster City, CA, USA).
Allele frequency, heterozygosity, polymorphism information content (PIC), discrimination power (DP), probability of paternity exclusion (PPE) were calculated using PowerStatesV12 (http://www.promega.com/). Tests for Hardy-Weinberg equilibrium were performed in Arlequin v3.5.1.3 19 . Since the statistical analyses in this study were on the basis of Bayesian-clustering algorithm, raw genotypic data of 13 STRs (excluding D6S1043 and D12S391) from 59 populations all around the world were extracted to determine population affinity 14,20-50 . Analysis of molecular variance (AMOVA), average number of pairwise differences, pairwise Fst, Slatkins linearized Fst, and coancestry coefficients were all calculated in Arlequin v3.5.1.3 19 using genotype data. The detailed population genetic structure was performed using model-based clustering method implemented in Structure 2.3.4 51,52 under assumptions of admixture, LOCPRIOR model, and correlated allele frequencies. Each run used 100,000 estimation iterations for K = 2 to 12 after a 20,000 burn-in length with several replicates. Posterior probabilities for each K were computed for each set of runs. Graphical display for Matrix plot of genetic distance and population structure were carried out in R statistical software v3.0.2 53 and Distruct v1.1 54 .

Results
Forensic parameter analysis. Fifteen STR loci were genotyped in two populations sampled from Gannan and Tianzhu of Gansu province and their allele frequencies along with a number of genetic and forensic parameters of interest are provided in Supplementary Table 1 and 2. No significant deviation was observed for Hardy-Weinberg equilibrium tests, indicating that our samples well represent the populations. The loci in both populations were highly discriminating with DP ranging from 0.852 to 0.974, demonstrating that those loci are useful for forensic identification.
Interpopulation genetic distances. We performed various parameters of genetic diversity and distances to infer population structure between Tibetans in Gannan and Tianzhu, as well as compared them with previously studied populations. The Tibetans in Gannan and Tianzhu fall into the general profile of Tibetan groups, showing extremely small genetic distances with other Tibetan populations. The within-population component of genetic variation, estimated here as 99.14% (Table 1) Table 3).
Clustering by structure analysis. We then applied a model based clustering algorithm in Structure to infer the detailed genetic ancestry at individual level. This approach will place individuals into K clusters, where K is set in advance but can be varied. The results for K = 2 to 7 are shown in Fig. 2   Previous studies, especially using mtDNA and Y chromosome, had suggested the North Asian origin of Tibetan populations 55,56 . Our results show that the Tibetans are quite distinctive from Siberian populations. The Siberian populations, such as Buryat, Altay, Tofalar, Sojot, and Khakas, share substantial genetic components with European groups which are rarely seen in Tibetan populations. The results are consistent with genome-wide evidence that there is no significant gene flow from West Eurasians into Tibetans 14,15 . We suspect that the proposed northern ancestral group that leaded to present-day Tibetan populations was probably separated with the lineage that later became the East Asian part of the Siberian groups earlier before the Siberian groups were extensively admixed with West Eurasian lineages. We caution that the geographical distribution of past populations is probably not accurately reflected in present-day distributions. An important direction for future work is to work out the exact phylogenetic relationship of the proposed ancient population branch leading to present-day Tibetan populations to other extant Eurasian groups by sequencing ancient samples from Tibetan Plateau and the Upper-Middle Yellow River Basin.
The genetic makeups of the Tai  Note that the bold names "Tibetan", "Lolo-Burmese", "Han Chinese", "northwest China", "Korean", "Siberian", "Tai-Kadai Hmong-Mien", "European", and "African" refer to the group classifications of present-day populations based on language and geographic affinity. Those names are not the labels for the inferred ancestral population in Structure analysis.