Introduction

Type 2 diabetes mellitus (T2DM) has become one of the greatest challenges for public health worldwide in the 21st century1. Diabetic complications, including cardiovascular disease, stroke, kidney disease and even cancer, contribute to about 4 million deaths per year globally2. According to International Diabetes Federation, the number of diabetic patients could possibly increase from 366 million in 2011 to 552 million by 20303. In Xinjiang Uyghur Autonomous Region, the prevalence of T2DM is significantly higher in Uyghur people than in other ethnic minorities of close phylogenetic relationship4. At the same body mass index (BMI) level, Uyghurs have greater risks for diabetes compared to other ethnic minorities in this area5. However, the factors attributed to this high prevalence of T2DM in Uyghur people remain unexplored.

T2DM is a progressive multifactorial disease, characterized by insulin resistance and reduction in insulin secretion6. Insulin resistance has developed many years before the diagnosis of T2DM7. Mitochondrial DNA (mtDNA) plays an important role in insulin resistance and T2DM. Mitochondrial dysfunctions, such as reduced capacity to mitochondrial oxidative phosphorylation, submaximal ADP-stimulated oxidative phosphorylation and mitochondria plasticity, have been implicated in the pathogenesis of insulin resistance and diabetes7,8,9,10. The effects of mitochondrial haplogroups and SNPs on diabetes have also been explored. In Japanese and Korean populations, the haplogroup N9a was indicated to possibly confer resistance against T2DM11. In contrast, in Chinese population, N9a is significantly associated with an increased risk of diabetic nephropathy12. D4 haplogroup had borderline resistance to T2DM in Chinese Han individuals, whereas B haplogroup associated with an increased risk of T2DM13. In Finnish, m.16189 T > C (D-loop) and m.3010 G > A (16S rRNA) was found to be more frequent in patients with diabetes than in controls14. In Chinese Han population, m.3243 A > G (16S rRNA) was associated with high prevalence of T2DM15.

T2DM is a multi-factorial common disease affected by both genetic and environmental factors. For example, racial/ethnic factor can significantly modify the prevalence of T2DM. Studies performed in different populations often exhibit diverse even conflicting results due to different health behaviors, genetic factors and individual-level sociodemographics16. To investigate the association between mtDNA and T2DM in Uyghur people, we conducted a hospital-based case-control study on 210 Uyghur individuals including 88 T2DM patients and 122 controls.

Results

Mitochondrial genome analysis

According to mitochondrial phylogenetic tree from Phylo-tree17, we classified the samples into 12 haplogroups (A, B, C, D4, F, G, H, HV, J, T, U and W), haplogroups (D5, I, X, Z, O, W, K, M7, M8, M9, N30) with a limited number (smaller than 6, 3%) were clustered to group Others in M and group Others in N. Haplogroups A, B, C, D4, F and G were common in Asians but rare in Europeans, while haplogroups H, HV, J, T, U and W were very rare in Asians but common in Europeans. This result was consistent with human mitochondrial haplogroup distribution formed by the Human migration (www.mitomap.org), suggesting that Uyghur people was an admixture of populations from Europe and Asia. Therefore, subject clustering was required before investigating potential association between mtDNA and T2DM. PCA analysis was performed and all 210 subjects were classified into 4 clusters (Fig. 1). The distribution of haplogroups in each cluster were shown in Fig. 2.

Figure 1
figure 1

Subjects distribution according to the PCA analysis. We employed Smartpca from EIGENSOFT package and K-means method from Weka to classify all 210 subjects into 4 clusters. Different colors indicated different clusters. Triangle indicates case subjects. Crossover indicates control subjects. Principal component 1 (PC1) and principal component 2 (PC2) were the first two axes of PCA and represented 7.626% and 5.276% variance.

Figure 2
figure 2

Distribution of haplogroups in 4 different clusters. Haplogroups with a sample size smaller than 6 (3%) are allocated into groups ‘OtherI’ and ‘OtherII’. Different colors indicated different clusters.

Using multivariate logistic regression analysis, we found that people harboring haplogroup H (odds ratio [OR] = 1.40; 95% confidence interval [CI]: 1.20–1.64; P = 0.0005138) and D4 (odds ratio = 1.47; 95% CI: 1.22–1.77; P = 0.001064) was more frequent in patients with diabetes than in controls (Table 1).

Table 1 Multivariate logistic regression analysis of mitochondrial haplogroups.

In addition, the markers related to haplogroups D4 and H, including variants m.3010 G > A (D4), m.8414 C > T (D4), m.14668 C > T (D4), m.2706 G > A (H) and m.7028 T > C (H) were more frequent in cases than in controls (Table 2), by Fisher’s exact test. Of these variants, m.8414 C > T from the haplogroup D4 was non-synonymous and located in mitochondrially encoded ATP synthase protein 8 (MT-ATP8). The m.2706 G > A and m.3010 G > A were located in mitochondrially encoded 16S RNA.

Table 2 Analysis of SNPs related to haplogroup D4 and H in 4 clusters※.

Variant damage analysis

We then extracted unique non-synonymous variants in cases (Table 3) to explore their potential functions. Four popular predictors were applied to assess the impact of AA substitution caused by these variants on the properties of mitochondrial proteins. The predictors were PolyPhen-218, CADD19, PANTHER20 and CAROL21.

Table 3 Unique SNPs from subjects with T2DM.

Interestingly, m.8414 C > T was highly suspicious. Three of four predictors suggested that m.8414 C > T was likely to influence the stability of the protein and may contribute to disease development (Table 4).

Table 4 Protein properties analysis results.

Protein 3D structure analysis

We performed 3D structure analysis using CABS-fold22 to assess the effect of variant p.Leu17Phe (m.8414 T > G) on the structure of MT-ATP8. The 3D structure of MT-ATP8 was displayed in Supplementary Figure 1.

In term of molecular modeling, potential energy of a system of atoms is calculated as force field, in which angles, bonds, torsions and impropers are taken into consideration. In current study, GROMOS96 force field was used to estimate the potential energy of MT-ATP8 by Swiss-PDBviewer23 (version 4.10). When MT-ATP8 17th amino acid (AA) was mutated from Leu to Phe by Swiss-PDBviewer, a decrease in H-bond number indicated a more unstable structure (Fig. 3). Using Swiss-PDBviewer, the force field around 17th AA was estimated to be increased from −40.646 to −12.161 (Fig. 4), which may lead to unstable protein structure around this area.

Figure 3
figure 3

3D structure of MT-ATP8 protein. (a) MT-ATP8 with p.17 L, red sticks are 17th amino acid Leu; (b) MT-ATP8 with p.17 F, red sticks are 17th amino acid Phe. Green dotted line: H-bonds.

Figure 4
figure 4

Force field curve around the 17th AA between 17 F and 17 L. Grey areas indicate higher energy caused by MT-ATP8 p.17 F. Black or line indicates the force field of MT-ATP8 with p.17 L or p.17 F, respectively.

Protein transmembrane property analysis

The subcellular location of ATP synthase is on the mitochondrial inner membrane, and the ATP synthase protein 8 is a single-pass membrane protein, according to annotations from Uniprot. So it is necessary to investigate whether the mutation p.L17F (m.8414 C > T) in the MT-ATP8 has an impact on the protein’s transmembrane area.

TMpred was used to assess protein sequence with a mutation p.L17F in the MT-ATP8. The result indicated that the transmembrane area of MT-ATP8 was changed from 8th-24th AAs to 11th-29th AAs, which might affect the assembly of ATP synthase (Fig. 5).

Figure 5
figure 5

Possible transmembrane helices change in MT-ATP8. Red or blue vertical line indicates the transmembrane region of MT-ATP8 with m.8414 T or m.8414 C, respectively. Only scores above 500 were considered reliable by TMpred.

RNA structure analysis

The rRNA plays an important role in human mitochondrial ribosome assembly and RNA modification, which are strongly linked with energy metabolism. RNA-fold server was used to predict the structural change caused by variant m.2706 G > A (Fig. 6). Oligowalk from RNAstructure24 (version 5.3) was employed to compare the target binding ability between 16S rRNA with and without m.2706 G > A. Notably, the structural change caused by variant m.2706 G > A from haplogroup H might weaken the binding between mitochondrial 16S RNA and its nucleic acid targets (Fig. 7). Therefore, structural change of mitochondrial 16S rRNA caused by variant m.2706 G > A might affect RNA modifications and ribosome assembly, in Uyghur population.

Figure 6
figure 6

Secondary structure of 16S RNA. (A) Secondary structure of 16S RNA with m.2706 G. (B) Secondary structure with m.2706 G > A.

Figure 7
figure 7

Oligo-Target binding curve. Green or red curve indicates the score of 16S rRNA with m.2706 G or m.2706 A, respectively. A more negative value indicates a tighter binding. Dashed line shows the position corresponding to m.2706 G > A.

Discussion

In this case-control study, we have identified an association between mitochondrial DNA and T2DM in Chinese Uyghur population. Mitochondrial DNA sequencing was conducted in 88 T2DM patients and 122 controls. Haplogroups H and D4 were found to be more frequent in T2DM patients than in controls.

A variety of diseases, such as Alzheimer’s disease, cardiovascular disease, Leber’s hereditary optic neuropathy (LHON) and T2DM have been associated with mitochondrial haplogroups25,26,27,28. It was reported that racial factors played an important role in the prevalence of T2DM16. Researches performed in different populations presented different even conflicting results. For Han Chinese, haplogroup N9a increased the risk of T2DM complications12, while in Japanese population, N9a was associated with resistance against T2DM11 or against metabolic syndrome in Japanese women29. In Chinese Taiwan population, haplogroup D4 was related to a borderline reduced risk for T2DM13. In contrast, haplogroup D4 was associated with an increased risk of T2DM in Korean population11 and Chinese Uyghur population in present research.

In our study, haplogroup D4 was more frequent in Chinese Uyghur T2DM patients, which was consistent with the findings in Finnish population, showing that variant m.3010 G > A was more frequent in cases than in controls. The variant m.3010 G > A marked haplogroup D4 in Chinese Uyghur subjects and haplogroup H1 in Finnish population. This indicated that m.3010 G > A variant was likely to be involved in the pathogenesis of T2DM. However, its biologic functions in the pathogenesis of T2DM require further investigation.

Haplogroup H was found to be associated with enhanced mitochondrial activities, including higher mitochondrial respiratory capacity (intrinsic activity)30, mitochondrial oxidative damage31 and intracellular ATP concentration32, while its sub-haplogroup H1 was associated with T2DM in Finnish population. In present study, haplogroup H showed a significant association with T2DM. Variants m.2706 G > A and m.7028 T > C were related to increased incidence of T2DM in Uyghur people harboring haplogroup H. The SNP m.2706 G > A was located in mitochondrially encoded 16S RNA, while m.7028 T > C was synonymous.

The structure of mitochondrially encoded rRNA is critical for human mitochondrial ribosome assembly33. RNA modifications also affect energy metabolism34. Structural changes of mitochondrial 16S rRNA might regulate RNA modifications, protein synthesis as well as ribosome assembly. In present study, a structural change in 16S rRNA caused by m.2706 G > A from haplogroup H was observed, and thus the binding ability of mutated 16S rRNA might be altered. This structural change might affect 16S rRNA modifications and mitochondrial encoded protein synthesis, which are important for energy metabolism.

In this study, we found that haplogroup D4 and H were more frequent in Chinese Uyghur T2DM patients. Variants m.2706 G > A and m.8414 C > T might be risk factors in the development of T2DM in Chinese Uyghur population.

Materials and Methods

Samples and mitochondrial genome sequencing

The healthy controls and cases, residents of Heitan, Keshi, and Korla districts in Xinjiang Uyghur Autonomous Region, were recruited from May 2010 to November 2011 at the Fifth Affiliated Hospital of Xinjiang Medical University. A written informed consent has been obtained from each subject. This study has been approved by Ethics Committee at the Fifth Affiliated Hospital of Xinjiang Medical University. All procedures were conducted according to the regulations and guidelines approved by the ethics committee at the Fifth Affiliated Hospital of Xinjiang Medical University.

Demographic and clinical information was extracted from electronic medical record, including age, gender, ethnicity, medical history, smoking and alcohol drinking. Smoking was defined as a current smoker or a former smoker (who quits) with tobacco/cigarettes consumption of more than 10 packs per year. Alcohol drinking was defined as alcohol consumption of at least 50 g per week for at least 6 months. Body weight and height were measured with light clothing and in bare feet. Blood samples were collected after overnight fasting. Glucose, Total Cholesterol, LDL, HDL and Triglyceride were measured with standard biochemical assays. T2DM was diagnosed according to WHO 1999 Diagnostic Criteria. Hypertension was defined as systolic pressure ≥140 mmHg and/or diastolic pressure ≥90 mmHg (from an average of 2 measurements or pre- and post-therapy). Hyperlipidemia was defined as fasting venous total cholesterol ≥200 mg/dL or LDL-C ≥ 130 mg/dL.

A total of 210 Uyghur objects, including 88 subjects with T2DM and 122 control subjects, were recruited in this study. Genomic DNA was extracted from peripheral whole blood using a DNA Blood Mini Kit 250 (QIAGEN Translational Medicine Co., Ltd., German). Fragments of mitochondrial DNA were amplified by PCR. The reaction mixture (100 μl) contained 20 ng of genomic DNA, 30 pmol of each primer, 5 mmol of each dNTP, and 2.5 U of DNA polymerase in a 10× PCR buffer. The cycling conditions were as follows: 94 °C for 5 min, followed by 35 cycles of denaturation at 94 °C for 30 sec, annealing at 65 °C for 40 sec, and extension at 72 °C for 4 min, with a final extension at 72 °C for 10 min. The fragments were purified and sequenced by Applied Biosystems 3730 DNA automatic sequencer at the Beijing Genomics Institute (Beijing, China). Primers for sequencing and PCR were shown in Supplementary Tables 1 and 2, respectively. The sequence data reported in this paper have been deposited in the Genome Sequence Archive of Beijing Institute of Genomics, Chinese Academy of Sciences. The information about age, gender and BMI of samples were shown in Supplementary Table 3.

Mitochondrial haplogroups classification

The resultant sequence data are aligned against with the Cambridge reference sequence (GenBank No., NC_012920). We determine the haplogroups of mitochondrial DNA according to the mtDNA tree17 and Mitomaster35.

Statistical analysis

We applied smartpca program from the EIGENSOFT version 3.0 to analyze the PCA and used K-means method from software Weka to cluster the PC1, PC2 and PC3 from PCA analysis.

We performed multivariate logistic regression analyses to demonstrate the association between the haplogroups and T2DM by glm function from R package (version 3.3.2), considering T2DM as a dependent variable, whereas information on haplogroups, the first two PCs from PCA, BMI, age and sex of each subject as independent variables. In multivariate logistic regression analyses, haplogroups with sample size smaller than 6 (3%) are allocated into group ‘Others in M’ and group ‘Others in N’ according to the phylogenetic tree. All P-values are corrected by Bonferroni method.

We used Fisher’s Exact Test to compare the frequencies of SNPs between cases and controls within each cluster. We calculated the P-values by two-tailed Fisher’s Exact Tests from R version 3.3.2, along with odds ratio (OR) and 95% confidence interval (CI). All P values were adjusted by Bonferroni correction. For Bonferroni correction, P values are adjusted according to the amount of haplogroups and SNPs in different clusters.

Variant damage analysis

We used a series of predicting tools, including PolyPhen-218, CADD19, PANTHER20 and CAROL21, to analyze a specific protein with special mutations. These tools focus on different areas, disease related SNPs, including conservation in the phylogenetic tree and structure features in transmembrane, intramembrane, signal peptide, binding region, etc. A variant was likely to be suspicious as disease-causing if more than 2 tools suggest deleterious.

Protein 3D structure analysis

We used CABS-fold server22 to predict the 3D structure of MT-ATP8 protein, due to lack of available X-ray 3D structure of human MT-ATP8 molecule. Afterwards, we used SWISS-Pdbviewer (version 4.10) to analyze mutations and to estimate force field.

RNA structure analysis

The RNA-fold sever (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi) was employed to predict the secondary structure of mitochondrially encoded 16S RNA. We adopted the minimum free energy prediction model. Oligowalk algorithm from RNAstructure program24 was employed to assess the target binding ability.

Data availability

The sequence data reported in this paper have been deposited in the Genome Sequence Archive of Beijing Institute of Genomics, Chinese Academy of Sciences, with an accession number CRA000365.