Mitochondrial haplogroup J associated with higher risk of obesity in the Qatari population

Obesity, a major risk factor for metabolic disorders, is highly prevalent in Qatari population. Maternal transmission of obesity traits can be significant; for example, X haplogroup is known to be associated with lower BMI and body fat mass in Northern Europeans and T haplogroup which is a sister haplogroup of J is known to be associated with obesity in Caucasian subjects from Austria and Southern Italy. We aimed to delineate the mitochondrial haplogroups and variants associated with obesity in Qatari population. Mitochondrial genomes of 864 Qatari individuals were extracted from whole exome sequencing data with an average coverage of 77X. We distributed the participants into 2 sub-cohorts: obese (BMI ≥ 30) and non-obese (BMI < 30); the mean value of BMI from these two groups were 36.5 ± 5.7 and 26.5 ± 2.6, respectively. Mitochondrial haplogroup profiling followed by uni- and multivariant association tests adjusted for covariates were performed. Qatari individuals with mitochondrial haplogroup J had an increased (twofold) risk of obesity (odds ratio [OR] 1.925; 95% CI 1.234–3.002; P = 0.0038; the Bonferroni adjusted P value threshold is 0.0041), whereas the individuals with haplogroup X were at low risk of obesity (OR 0.387; 95% CI 0.175–0.857; P = 0.019). Further, a set of 38 mitochondrial variants were found to be associated (at P ≤ 0.05) with obesity in models adjusted for age, sex and haplogroup.

after gastric bypass and bariatric surgery 12 . A novel mitochondrial tRNA Cys mutation at MT:5802A > G observed in 3 matrilineal relatives has been attributed as the probable cause for obesity in the family with matrilineally inherited obesity 13 . Furthermore, the mitochondrial haplogroup T has been suggested to increase the risk of obesity in 2 independent association studies on Austrian 14 and Southern Italian 15 populations. Maternal haplogroups X and H have been proposed to reduce the risk of obesity in Caucasians of northern European origin in the United States (US) 16 as well as in Arabs living in Kuwait 17 , respectively. It is worth noting that these studies on maternal haplogroups were performed using variants that were extracted either from genome-wide genotype data generated using BeadChips in genome-wide association studies or from sequencing partial sections of the mitochondria genome using Sanger sequencing. Both genotyping strategies limit the number of analyzed SNPs and subsequently impact the resolution of maternal haplogroup prediction and the overall results. Nextgeneration sequencing (NGS) offers high throughput for large cohorts while still offering the advantages of faster processing and lower cost compared with Sanger sequencing. A number of studies have demonstrated that both whole genome and exome sequencing can indirectly target the whole mtDNA with good coverage 18,19 , and such studies have addressed the genetic basis of both monogenic 20,21 and multifactorial disorders 22 .
In this study, we extracted mtDNA from whole exome data on 864 Qatari individuals and evaluated the mitochondrial variants and haplogroups for associations with obesity.

Results
Study population. The clinical characteristics of the Qatari samples are presented in Table 1. The mean value of BMI of the individuals from the cohort was 32.7 ± 6.8. The obese and non-obese groups differed significantly in sex distribution (P < 0.001), age (P = 0.006), and body mass index (BMI) (P < 0.001). The mean value of BMI from these two groups were 36.5 ± 5.7 and 26.5 ± 2.6, respectively. mtDNA coverage and variants. The average coverage across the whole mitochondrial genome of the Qatari whole exome samples was 77X (Fig. 1), while the average coverage of the mtDNA of the Qatari whole genome samples (technical replica) was 4537X.
A total of 1831 mitochondrial variants (including 1775 SNPs and 56 INDELs) were identified using the whole exome samples. All mitochondrial SNPs identified in the 8 whole genomes were captured by the whole exomes, apart from one SNP, namely, MT:3492 A > C, which we identified only in the whole genomes of 4 individuals. Regarding INDELs, we observed inconsistencies between the whole exome and the whole genome in 4 instances (at MT:302, MT:8271, MT:16,179, and MT:16,182).
Mitochondrial haplogroup association with obesity. The average quality score of predicted haplogroups for all our samples using HaploGrep2 was 93%. In total, we identified 15 major mitochondrial haplogroups (B, E, H, HV, I, J, K, L, M, N, R, T, U, W, and X) in the 864 whole exome samples (Fig. 2). The most common mitochondrial haplogroups in the Qatari samples were U (15%), J (14%), R (12%), T (12%), and L (11%). The mitochondrial haplogroup assignments observed in the 8 whole genomes (technical replica) were the same as those observed in their respective whole exomes (Table 2). Figure 3 shows the clustering patterns of the samples based on principal component analysis, considering obesity status and mtDNA haplogroup. The samples clustered well when we considered the maternal ancestral background through their mtDNA haplogroups rather than considering the obesity status or sex (plot not shown).
In mitochondrial comparison analysis, haplogroups with limited number of individuals were grouped together as one group called "others". Table 3 and Fig. 4 show the mitochondrial haplogroup frequencies for the www.nature.com/scientificreports/      www.nature.com/scientificreports/ corrected for multiple testing of 0.0041 to identify statistically significant associations. Only the J haplogroup has an adjusted P-value of 0.0038 (after adjusting for age, sex and haplogroups) which makes it statistically significant against the Bonferroni corrected P-value threshold of 0.0041.

Mitochondrial variant association with obesity.
Quantile-quantile plots depicting the expected and observed − log (p-values) for association of the mitochondrial variants with obesity are presented in Fig. 5. Genomic-control inflation factor (λ) was observed as 1.0035 in tests with corrections for age, sex and haplogroups. The value at close to 1.0 do not necessitate correcting association statistics for genomic-control inflation. Table 4 presents the results of multiple regression test, performed to identify the variants associated with obesity at P-values < 0.05; it is to be noted that we did not apply Bonferroni correction towards multiple testing for total number of variants analyzed (n = 1831) for the P-value threshold for significance (= 0.05/1831 = 0.000027]). We observed a set of 38 mtDNA variants to have nominal associations (P-value ≤ 0.05) with obesity, with models adjusted for age, sex, and mitochondrial haplogroup. 26 were positively correlated with the risk of obesity, and 12 were negatively correlated with risk of obesity. Of these variants associating with obesity, five (namely, MT:295C > T, MT:11377G > A, MT:12171A > G, MT:16145G > A and MT:16222C > T) were seen correlated with others in the list of associated variants. Conditional analysis using the top leading MT:16069C > T variant with the lowest p-value from Table 4 as the conditioning SNP, identified 14 of the remaining variants remaining as significant. We assume that these 14 variants (see the footnote to Table 4) to have an independent effect on the phenotype with respect to the conditioned SNP. We assume that the remaining 23 mitochondrial variants (that included the above-mentioned LD-dependent 5 variants) from Table 4, which lost their significance, are dependent on the top leading mitochondrial variant.
Furthermore, we found 8 additional variants (not included in Table 4) that were detected in only one of the BMI groups in more than 3 individuals. Of these 8 variants, (a) 7 were observed in the groups with obesity, including 5 synonymous variants in MT-ND4L (15 individuals), MT-ND6 (14 individuals), MT-CO2 (11 individuals), MT-ND5 (7 individuals), and MT-ND2 (7 individuals) genes, as well as 2 non-coding variants in MT-TP (12 individuals) and MT-TF (7 individuals) genes; (b) 1 was observed in the group without obesity, namely, a synonymous variant in the MT-ND1 gene (4 individuals).
We assessed the clinical significance of the associated variants by examining the ClinVar, Mitomaster, and Mitomap databases and found that none of the variants was annotated as a pathogenic mutation for obesity or for other disorders.
Most of the variants identified as associated with obesity were located within the same mitochondrial genes. However, the mitochondrial genes harboring synonymous variants were only positively associated with the risk of obesity, such as cytochrome c oxidase III (MT-CO3), nicotinamide adenine dinucleotide (NADH) dehydrogenase 2 (MT-ND2), NADH dehydrogenase 4 (MT-ND4), NADH-ubiquinone oxidoreductase chain 4L (MT-ND4L), and non-coding variants; as well as tRNA histidine (MT-TH), tRNA serine 1 (MT-TS1), and tRNA phenylalanine (MT-TF). On the other hand, genes harboring both synonymous and missense variants were only negatively associated with the risk of obesity, such as cytochrome c oxidase I (MT-CO1) and non-coding variants, tRNA cysteine (MT-TC), and tRNA threonine (MT-TT).

Discussion
Our study employed whole exome reads to delineate the mitochondrial DNA variants and haplogroups. The mtDNA coverage from whole exome data sequenced using the Agilent V5 kit is good across the whole genome (Fig. 1). The principal component analysis (PCA) clustered samples of the same haplogroup well. The delineation of mtDNA variants using whole exome reads was consistent when whole genome reads were employed. The   www.nature.com/scientificreports/ Our study identified the mitochondrial haplogroup J, the second most frequent haplogroup in the cohort, as significantly associated with a higher risk of obesity. This haplogroup had a significantly higher frequency in the group with obesity than in the group without -obesity. Subclade J1b was the main contributor in our Qatari cohort, with 9% of the samples displaying J1b, a rate similar to the frequency distribution observed in Saudi Arabia (9.4%) 27 . In contrast to our observation of J haplogroup associated with a higher risk of obesity in Qatar, Nardelli et al. (2013) reported that the frequency of haplogroup J was lower in the group with morbid obesity and that the J haplogroup conferred a lower risk of obesity in the Southern Italian population. In a study by Ebner et al. (2015) on Austrian juveniles and adults, haplogroup J showed no association with obesity. These contradictory findings are probably due to differences in geographic origin or simply to differences in study design (such as BMI thresholds for defining obesity status and consideration of only the D-loop region).
Both the above-mentioned studies by Nardelli et al. and Ebner et al. reported that T haplogroup was associated with a risk of obesity in Caucasians from Southern Italy and Austria, respectively, whereas our study showed that J haplogroup was associated with obesity in Arabs from Qatar. It is interesting to note that J and T are sister haplogroups and share polymorphisms 14 , such as MT:4216 T > C, which was significantly correlated with risk of obesity in our study (Table 4). However, the frequency of haplogroup T was not significantly different between our study groups with and without obesity (Fig. 4) and was not associated with obesity (Table 3), which might be due to the fact that the frequencies of haplogroups and their defining variants can vary even among closely related populations, such as Caucasian populations 28 .
The frequency of mitochondrial haplogroup X was higher in the group without obesity than the group with obesity in our study cohort, which is consistent with reported observations that Caucasians of northern European origin in the US with haplogroup X have lower BMI and fat mass values 16 . All the individuals within haplogroup X belonged to Eurasian branch X2, which is present in Saudi Arabia 27 and Yemen 28 .
A recent study 17 reported a higher frequency of haplogroup H in the control group (BMI < 25 kg/m 2 ) compared with the group with overweight and obesity (BMI > 25 kg/m 2 ) among Arabs living in Kuwait. The frequency of haplogroup H in our Qatar study cohort was slightly higher in the group without obesity; nevertheless, the difference was not significant compared with the group with obesity, which might be due to differences in the admixture of Arab ethnicities in the region.
To prioritize our results, we focused on 23 exonic variants that were significantly associated with obesity in the univariate and multivariate analysis (Table 4), as well as 6 exonic variants detected in only 1 of the 2 BMI groups. Most of these variants are within the NADH dehydrogenase subunit genes, complex I, which is involved in cell energy production 24 ; as a result, the prioritized variants (as listed in Table 5) in this study can influence their gene function and are associated with body fat mass and obesity 14,29 . MT-RNR1, which harbors negatively correlated non-coding variants (MT:930G > A and MT:750G > A), is involved in metabolic homeostasis with protective function against diet-induced obesity 30 . The MT-RNR2 gene, which harbors negatively associated non-coding variant MT:2352 T > C, is known to protect against oxidative stress 31 . Cytochrome c oxidase subunit I and II genes regulate the OXPHOS system, which is essential for energy production and survival 32 . Another gene that regulates OXPHOS is the ATP synthase protein 8 (MT-ATP), where we found 2 synonymous variants (MT:8473T > C and MT:8386C > T). Lastly, mutations within the cytochrome b gene (MT-CYB) can cause prominent exercise intolerance, which can lead to obesity 33 . We identified a synonymous variant (MT:15679A > G) in this gene that was positively associated with obesity.   16 ; and MT:8994G > A from the German and French study 34 .
There are limitations in our study that need to be acknowledged. None of the reported associations involving mitochondrial DNA variants and obesity passes the Bonferroni corrected P-value thresholds. The threshold value for BMI (30 kg/m 2 ) used to define the obesity and no obesity/control groups differs from that used in previous obesity studies using mtDNA [14][15][16][17]29,[34][35][36] . We set a higher threshold for BMI in our study mainly because of the low number of lean BMI individuals in the cohort (the mean BMI value of the cohort was 32.7 ± 6.8 kg/m 2 ) and the desire to balance the sample sizes of the 2 groups; however, the mean BMI value in the group without obesity was borderline (26.5 ± 2.6 kg/m 2 ), and that of the group with obesity was much higher (at 36.5 ± 5.7 kg/ m 2 ). Although we used different BMI groupings, we managed to show that haplogroup X was associated with lower BMI, in line with earlier observations 16 . In addition, our study did not explore heteroplasmy, due to the mtDNA coverage and the fact that it would be more suitable with the enriched capture kit for whole mitochondrial genome (higher coverage) and from a variety of specimens.
To conclude, we conducted the largest association tests of mitochondrial haplogroups and variants with obesity performed in the Middle East. Furthermore, our results demonstrated that Qatari individuals with haplogroup J are at increased risk (approximately twofold) of obesity. Our results also confirmed that the frequency of mitochondrial haplogroup X are low in obese individuals in the Qatari population; however, the association signal did not survive the correction for multiple testing.

Materials and methods
Ethics statement. This study was reviewed and approved by the institutional Ethical Review Committee at Dasman Diabetes Institute, Kuwait in accordance with the declaration of Helsinki. The human wholeexome data used in this study were publicly available from the National Center for Biotechnology Information Sequence Read Archive. The original studies 37,38 that generated these data and made available at the public resource had obtained written informed consent of participants who were recruited under protocols approved by the Institutional Review Boards of Hamad Medical Corporation and Weill Cornell Medical College in Qatar.
Study exome data. Whole exome read data from individuals living in Qatar 37,38 , as sequenced using Agilent SureSelect Human All Exon V5 and V4 kits (Agilent Technologies Inc., USA) on the Illumina HiSeq platform, were publicly available from the National Center for Biotechnology Information Sequence Read Archive (SRA accessions SRP060765, SRP061943 and SRP061463). We downloaded the data and considered only those exomes from native Qatari individuals and those that were sequenced with the Agilent V5 kit (Agilent Technologies Inc., USA), resulting in a data set of 864 native Qatari individuals. We divided this cohort into 2 groups based on their BMI: 532 individuals in the group with obesity (BMI of ≥ 30 kg/m 2 ) and 332 in the group without obesity (BMI < 30 kg/m 2 ). We downloaded whole genome sequence data on 8 individuals (common with whole exome samples, sharing the same sample identification number) available from the same Qatari studies and used the data to validate the mitochondrial variants called using exome data. mtDNA sequences and variant calling. We mapped the raw paired-end reads to human genome reference version GRCh37 using Burrows-Wheeler Aligner version v07-17 with default parameters 39 . We employed the Picard tool version 2.20.2 (http://broad insti tute.githu b.io/picar d) to flag and remove duplicate reads. We employed SAMtools version 0.1.19 40 to extract the mtDNA reference sequence (NC_012920.1) and the Genome Analysis Tool Kit (GATK) version v3.8-1-0 41 to calculate the average mtDNA coverage. Subsequently, we created a Genomic Variant Call Format (GVCF) file for each sample using the GATK haplocaller and combined the GVCF files with GATK CombineGVCFs into a single GVCF file for each sequencing capture of whole genome and whole exome samples. We employed GATK GenotypeGVCFs for mtDNA variant identification.
Haplogroup prediction. The final mtDNA variants were in Variant Calling Format files for both the whole exome and whole genome (technical replica) samples, and these were used for haplogroup profiling, which was achieved by using the HaploGrep 2 tool 42 , based on phylotree build 17 (accessed on 2 February 2020). To assess the accuracy of the mitochondrial haplogroup assignment for a whole exome sample, we compared the results with those obtained using the whole genome data when both the whole exome and genome sequence data were available for the sample.
Statistical analysis. We used R software (version 3.6.2) (https ://www.R-proje ct.org/) to perform the statistical analysis on the clinical characteristics. The descriptive statistics for the categorical variables are presented as numbers and percentages, whereas those for continuous variables are presented as mean ± standard deviation and median and interquartile range. We performed normality tests on the traits (such as age and BMI) using the Shapiro-Wilk test and comparisons between the groups with and without obesity using the nonparametric Mann-Whitney U test. In addition, we applied the chi-squared test to determine whether there was a differential distribution of sex in the BMI groups. We conducted a PCA to test whether the mtDNA clustered the samples based on their BMI groups (obesity and no obesity) and to reveal any other hidden relationships such as haplogroup and sex. We used the PCAtools package on R software to conduct the PCA analysis.
We performed Fisher's exact tests to examine the associations between the mtDNA haplogroups and the obesity and no obesity groups. We calculated the ORs and 95% confidence intervals (CI) for each haplogroup and set P < 0.05 as the threshold for statistical significance. Furthermore, to adjust for age and sex, we performed