Mitochondrial DNA genomes revealed different patterns of high-altitude adaptation in high-altitude Tajiks compared with Tibetans and Sherpas

High-altitude Tajiks (HA-Tajiks), Tibetans and Sherpas are three groups of high-altitude native people in China. The differences in the mtDNA genome between the three populations and the role of the mtDNA genome in the high-altitude adaptation of HA-Tajiks were seldom investigated. In this study, 80 HA-Tajiks were enrolled, and their whole mtDNA genomes were sequenced. The haplogroup of each subject was determined by comparison to the revised Cambridge Reference Sequence (rCRS). Ten additional populations from East Asia and Central Asia, including Tibetans and Sherpas, were selected as references. The top haplogroup was U, followed by H, T and J. Principle component analysis and genetic distance analysis indicated that HA-Tajiks showed a close relationship with Wakhi Tajiks, Pamiri Tajiks and Sarikoli Tajiks, indicating that they should be considered one nation scattered around the Pamirs. The difference in the mtDNA genome between HA-Tajiks and Sherpas was significantly greater than that between HA-Tajiks and Tibetans. Among the 13 genes related to the OXPHOS pathway encoded by the mtDNA genome, HA-Tajiks showed more significant differences in ND3 and CYTB compared to Tibetans. Compared to Sherpas, HA-Tajiks showed more significant differences in ND1, ND2, COX1, ATP8, ATP6, ND3, ND4L, ND4, ND5 and CYTB. The associated functional changes and underlying molecular mechanisms should be explored by molecular and biochemical investigations in further studies.

There are three groups of high-altitude natives living in China, including high-altitude Tajiks (hereafter referred to as HA-Tajiks), Tibetans and Sherpas. The HA-Tajik population is one of 56 ethnic groups in China. HA-Tajiks have lived for generations in the Xinjiang Taxkorgan Tajik Autonomous County, where the average elevation is more than 4,000 m. Taxkorgan is perched in the highest part of the Pamirs. The world's second highest peak, Mount Qogir, towers over the south, and in the north stands Mount Muztagata, "the father of ice peaks. " The HA-Tajik people seldom intermarry with other ethnic groups, and this ethnic identity means that their genetic structure shows little mixing with outsiders. Thus, it is attractive and feasible to explore the role of mtDNA in the high-altitude adaptation of HA-Tajiks. Tibetans are considered to be an ethnic group that is adapted to the high-altitude environment, and it has been reported that mtDNA variations influence the efficiency of oxygen utilization and function in native Tibetans 14 . Sherpas live south of the Himalayas and are famous for their physical ability in climbing Mount Everest and are well-known as porters in the Himalayas. Their characteristics at high altitudes are considered as markers of high-altitude adaptation. Previous studies have found that eNOS, PPARA 15 , HIF, ACE 16 and EPAS1 in the nuclear genome and CYTB, ATP6, ND1, ND4 and ND5 in the mtDNA genome play a great role in the high-altitude adaptation of Sherpas 17 .
Recently, a study analyzed mtDNA genomes in different populations in Central Asia located around the Pamirs 18 , and different populations of Tajiks living at high altitude were also investigated. Because partial mtDNA sequences are not able to provide enough information, so complete mtDNA genome sequences were required for this research. To better understand the genetic structure of the mtDNA genome and identify the possible role of the mtDNA genome in high-altitude adaptation in HA-Tajiks, 80 HA-Tajik individuals living in Taxkorgan were enrolled, and their whole mtDNA genomes were sequenced. We also examined the mtDNA genomes of Tibetans and Sherpas as well as other reported Tajik populations 18 to compare the genetic differences between them and to analyze different patterns of high-altitude adaptation between three high-altitude native populations at the Qinghai-Tibetan Plateau and in the Pamirs from perspectives of mtDNA variations.

Results
Study population and reference population in this study. A total of 11 populations, including 706 subjects, were analyzed in this study. The basic information of the subjects is listed in Table 1. mtDnA genetic diversity of tajiks. The number of haplotypes, nucleotide diversity, haplotype diversity, Tajima's D and Fu's Fs were 73, 0.00216 ± 0.00024, 0.997 ± 0.00001, − 2.277 and − 33.741, respectively. The main frequencies of the haplogroups in the 80 HA-Tajiks analyzed in this study are shown in Fig. 1, and the major haplogroups were U, followed by H, T and J, indicating that the HA-Tajik population settled in Taxkorgan, China, may have originated from Europe 18,19 . Haplogroup of each subject was provided in Table S1. Phylogeny of 80 HA-Tajiks calculated by mtPhyl (version 5.003) was provided as dataset1 with a xlsx file. Detailed frequencies of haplogroups in 80 HA-Tajiks as well as other reference population were provided in Table S2.
Bayesian skyline plot (BSP) for HA-Tajiks. BSP was conducted to trace historical variations in population size based on coding regions of mtDNA genome (Fig. 2). Although the sample size of HA-Tajiks is small in this study, it could be inferred that the effective population size of HA-Tajiks is steadily growing, which is consistent with demographic data of HA-Tajiks in China, especially in recent years (https ://www.china .org.cn/e-group s/shaos hu/shao-2-tajik .htm). Besides, the expansion of HA-Tajiks revealed by the BSP is also in accordance with the negative Tajima's D and Fu's Fs.
Genetic relationship of HA-Tajiks with other populations. PCA was applied based on the frequencies of haplogroups in the mtDNA genomes to represent the relationships among the 11 populations enrolled in this study after transformation (Fig. 3). The detailed frequencies of the mtDNA haplogroups in the 11 popula-   Table S2.   (Table S2).  Tables 2 and 3, and the remaining results related to OXPHOS are provided in Table S3 and Table S4. Table 2 showed that the frequencies of the variant polymorphisms 10400T, 14783C, 15043A and 15301A in Tibetans were significantly higher than those in Tajiks. Polymorphism 10400T belongs to ND3 and polymorphisms 14783C, 15043A and 15301A to CYTB. Hence, it seemed that polymorphisms in CYTB may provide more clues about high-altitude adaptation in Tibetans on the Qinghai-Tibetan Plateau than in HA-Tajiks in the Pamirs, and other regions of the mtDNA genome should be evaluated in relation to high-altitude adaptation in HA-Tajiks.
Compared to Sherpas, HA-Tajiks exhibited more polymorphisms in multiple regions of the mtDNA genome, and detailed results are listed in Table 3. The mtDNA genome of HA-Tajiks showed significant differences in ND1, ND2, COX1, ATP8, ATP6, ND3, ND4L, ND4, ND5 and CYTB, which are more complicated than the  www.nature.com/scientificreports/ comparisons between Tajiks and Tibetans. Polymorphisms 14783C and 15301A of CYTB showed the most significant differences in the distribution between HA-Tajiks and Sherpas. In addition, it could also be inferred that Tibetans and Sherpas may present different patterns of high-altitude adaptation from the perspective of the mtDNA genome, because frequencies of certain variants in mtDNA genome also showed significant differences between Tibetans and Sherpas.

Discussion
This is the first study to compare and analyze molecular evidence of high-altitude adaptation in three highaltitude native groups from the perspective of the mitochondrial genome. In this study, the whole mtDNA genomes of 80 HA-Tajiks living in the Pamirs were sequenced, and the major mitochondrial haplogroups in HA-Tajiks were U, H and T as well as J confirmed that HA-Tajiks settled in the Pamirs were likely to be originated in Europe. BSP revealed by BEAST indicated that the effective population size was steadily increasing, which was in consistent with the current status of HA-Tajiks in China. Based on PCA and genetic distance analysis, we found that the HA-Tajiks enrolled in this study exhibited a close relationship with Wakhi Tajiks, Pamiri Tajiks and Sarikoli Tajiks. The difference in the mtDNA genome between HA-Tajiks and Sherpas was significantly greater than the differences between HA-Tajiks and Tibetans, suggesting different patterns of high-altitude adaptation were existed between HA-Tajiks, Tibetans and Sherpas. BSP was an effective tool to analyze population size based on mtDNA genome sequences. However, different settings of analysis methods would generate various results even dealing with the same sequences. Compared to Peng's research 18 , significant decline in effective population size was observed by BSPs, which is conflict with negative Tajima's D and Fu's Fs yielded in his research. Possible explanations were different sample strategies, and mainly were attributed to different parameters of strict clock model and analysis methods. In Peng's research, this strict clock model was 1.404 × 10 −8 substitutions per site per year and the TrN + I + G substitution model, and 2.308 × 10 −8 substitutions per site per year as well as the GTR model of nucleotide substitution with empirical base frequencies were employed in this research. In addition, appropriate analytical methods are not easily determined when facing big data. Hence, parameters setting of software are not only depend on references but also in accordance to the actual situation as well as population history.
Although HA-Tajiks, Tibetans and Sherpas are high-altitude natives, they exhibited great differences in their languages, customs and origins 18,20 . Even after the long-term settlement of high-altitude environments, these three groups may show different responses to hypoxia stimulation at the molecular level, and variants in the mtDNA genome provided an useful tool for further analysis 1 . Mitochondrion have been reported to be involved in high-altitude adaptation in Tibetans and Sherpas. The mitochondrial nt3010G-nt3970C haplotype 21 and haplogroup M9a1a1c1b in Tibetans 13,22 and haplogroups C4a3b1 and A4e3a in Sherpas 17 contributed to highaltitude adaptation. Compared to Tibetans, the distribution of polymorphisms 10400T, 14783C, 15043A and 15301A showed the most significant differences in Tajiks. However, the frequencies of these polymorphisms in Tajiks were significantly lower than those in Tibetans, indicating that other regions of the mtDNA genome should be considered in the assessment of the role of mtDNA in high-altitude adaptation in HA-Tajiks. Among the 13 genes belonging to the OXPHOS pathway, we found that the frequency of variant polymorphism 12372 (12372A) in ND5 was the most significantly increased compared to that in Tibetans (Table S3). Taking U haplogroup as an example, 12372A is the marker of the U haplogroup, which accounted for the main lineage in the 80 HA-Tajiks enrolled in this study. Functional changes in 12372A were not detected, and this polymorphism showed no association with sudden infant death syndrome in the Swiss population 23 . Although the 12372A variant in ND5 does not result in an amino acid change, it may affect the DNA-RNA network, mRNA stability, RNA splicing, translation kinetics and protein folding, etc. 15,24 , which ultimately contributes to disease progression. It seems that many pathogenic mutations are enriched in the mtDNA genomes of high-altitude native peoples 13 . However, these deleterious mutations do not induce a harmful phenotype in high-altitude natives, indicating that the diseases caused by mitochondrial mutations may be population specific. Hence, the role of 12372A in high-altitude adaptation in Tajiks needs to be reevaluated with larger samples, and the molecular mechanism associated with the 12372A mutation should be analyzed in further studies.
In the comparison between Tajiks and Sherpas, the frequencies of polymorphisms 3594T and 4104G in ND1, 7146G and 7256T in COX1, 8468T in ATP8, 8655T in ATP6, 10664T and 10688A in ND4L, 10810C and 10915C in ND4, and 13105G, 13276G, 13506T and 13650T in ND5 were found to be significantly higher in Tajiks. In addition, these variants seemed to be in linkage disequilibrium. Except for polymorphisms 7146G, 13105G and 13276G, all other polymorphisms mentioned above did not induce amino acid changes. 7146G, 13105G and 13276G result in amino acid changes from threonine to alanine, isoleucine to valine and methionine to valine, respectively. However, genotype-phenotype association and functional analyses based on three nonsynonymous amino acid mutations have seldom been reported. The 7146G mutation was located in COX1, and this variant may influence the biogenesis of COX1 and ultimately cause a different response at the molecular level in Tajiks in the Pamirs compared with Sherpas on the Qinghai-Tibetan plateau, which needs to be checked through functional analysis. 13105G and 13276G in ND5 were found to be associated with a risk of cervical cancer 25 . ND5 is a primary subunit of nicotinamide adenine dinucleotides, and a mutation in ND5 can induce higher levels of reactive oxygen species 26 . Combined with 13506T and 13650T in ND5, it seemed that alterations in ND5 would induce cumulative effects and may play a great role in high-altitude adaptation in Tajiks. However, the functional significance and the underlying mechanisms are unknown. Further investigations including molecular and biochemical studies are needed to explore the role of mutations in ND5 in high-altitude adaptation in HA-Tajiks.
With the development of DNA sequencing technologies, next-generation sequencing has been widely applied in genetic studies, and the accuracy has increased significantly. Traditional Sanger sequencing based on PCR products was performed in this study, and we found that checking DNA sequencing results and the assembly Scientific RepoRtS | (2020) 10:10592 | https://doi.org/10.1038/s41598-020-67519-z www.nature.com/scientificreports/ of DNA fragments manually is highly time and energy consuming. Hence, in the acquisition of whole mtDNA genomes, next-generation sequencing should be considered as the main tool in further analysis, and Sanger sequencing could be a powerful additional complement, especially when the results acquired by next-generation sequencing are ambiguous.

conclusion
We found that the lineages including U, H, T and J accounted for most HA-Tajik samples, indicating an European origin, and the HA-Tajik population showed a closer relationship with Wakhi Tajiks. BSP revealed by BEAST indicated that the effective population size was steadily increasing. Among 13 genes belonging to the OXPHOS pathway encoded by the mtDNA genome, HA-Tajiks showed the most significant differences in ND3 and CYTB compared to Tibetans. Compared to Sherpas, HA-Tajiks showed the most significant differences in ND1, ND2, COX1, ATP8, ATP6, ND3, ND4L, ND4, ND5 and CYTB. The associated biochemical changes and molecular mechanisms should be explored by functional investigations in further studies.  Table S7. Sequence data for the mtDNA genome were acquired from the 1000 Genomes Project or downloaded from GenBank as the references indicate. Variant sites and haplogroups were also acquired by using Mitotool as mentioned above. All the mtDNA sequences of 80 HA-Tajiks as mentioned above were deposited in Genbank (MT554196-MT554275).

Materials and
Analysis of mtDnA genomes. The mtDNA genetic diversity of HA-Tajiks (including haplotype numbers, nucleotide and haplotype diversity, Tajima's D, and Fu's Fs) was calculated by DnaSP6.0 28 . The genetic distances between the HA-Tajiks and the reference populations based on the mtDNA genomes were analyzed by PCA based on the haplogroup distributions in different populations after normalized transformation 29 . The Fst values between the populations enrolled in this study were calculated with ARLEQUIN 3.5.1.3 30 . A phylogenetic tree was constructed with a neighbor-joining tree with MEGA 7.0. The phylogeny was constructed by using mtPhyl 31 (https ://sites .googl e.com/site/mtphy l/home) with 80 complete mtDNA sequences in HA-Tajiks. The variants located in regions 16519, 16180-16193 and 310-315 and the AC indels in regions 515-522 were not included in the subsequent analysis 18 .
Bayesian phylogenetic analysis of HA-Tajiks. www.nature.com/scientificreports/ Detailed differences in the mtDNA genomes between Tajiks, Tibetans and Sherpas. To further identify the differences between HA-Tajiks, Tibetans and Sherpas, their whole-mtDNA-genome sequences were compared to the rCRS to confirm the variants in the mtDNA genomes in detail. The number of variants in different regions of the mtDNA genome was calculated by direct counting, and the variant distributions in different regions were analyzed by the chi-square test, which may reveal different patterns of high-altitude adaptation among HA-Tajiks, Tibetans and Sherpas. For the chi-square test, a Bonferroni correction was applied for multiple tests. Because the number of comparisons was 2 (HA-Tajik vs Tibetan, HA-Tajik vs Sherpa), the statistical significance level was set at 0.05/2 (0.025), and all p values were two-tailed.