Whole-exome sequencing identifies common and rare variant metabolic QTLs in a Middle Eastern population

Yousri, Noha A.; Fakhro, Khalid A.; Robay, Amal; Rodriguez-Flores, Juan L.; Mohney, Robert P.; Zeriri, Hassina; Odeh, Tala; Kader, Sara Abdul; Aldous, Eman K.; Thareja, Gaurav; Kumar, Manish; Al-Shakaki, Alya; Chidiac, Omar M.; Mohamoud, Yasmin A.; Mezey, Jason G.; Malek, Joel A.; Crystal, Ronald G.; Suhre, Karsten

doi:10.1038/s41467-017-01972-9

Download PDF

Article
Open access
Published: 23 January 2018

Whole-exome sequencing identifies common and rare variant metabolic QTLs in a Middle Eastern population

Noha A. Yousri^1,2^na1,
Khalid A. Fakhro^1,3^na1,
Amal Robay¹,
Juan L. Rodriguez-Flores ORCID: orcid.org/0000-0002-0394-8062⁴,
Robert P. Mohney ORCID: orcid.org/0000-0002-9315-9541⁵,
Hassina Zeriri¹,
Tala Odeh¹,
Sara Abdul Kader⁶,
Eman K. Aldous⁷,
Gaurav Thareja⁶,
Manish Kumar⁶,
Alya Al-Shakaki¹,
Omar M. Chidiac¹,
Yasmin A. Mohamoud⁷,
Jason G. Mezey⁴,
Joel A. Malek ORCID: orcid.org/0000-0002-1516-8477^1,7,
Ronald G. Crystal⁴ &
…
Karsten Suhre ORCID: orcid.org/0000-0001-9638-3912⁶

Nature Communications volume 9, Article number: 333 (2018) Cite this article

7319 Accesses
50 Citations
19 Altmetric
Metrics details

Subjects

Abstract

Metabolomics-genome-wide association studies (mGWAS) have uncovered many metabolic quantitative trait loci (mQTLs) influencing human metabolic individuality, though predominantly in European cohorts. By combining whole-exome sequencing with a high-resolution metabolomics profiling for a highly consanguineous Middle Eastern population, we discover 21 common variant and 12 functional rare variant mQTLs, of which 45% are novel altogether. We fine-map 10 common variant mQTLs to new metabolite ratio associations, and 11 common variant mQTLs to putative protein-altering variants. This is the first work to report common and rare variant mQTLs linked to diseases and/or pharmacological targets in a consanguineous Arab cohort, with wide implications for precision medicine in the Middle East.

Genome-wide association studies

Article 26 August 2021

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Utility of polygenic scores across diverse diseases in a hospital cohort for predictive modeling

Article Open access 12 April 2024

Introduction

Metabolites represent functional intermediates to the end phenotype, can be conserved over several years’ time frame, and can uniquely identify individuals¹. Several studies have shown that they are influenced by a combination of genetics and environment, the latter comprising both life style exposures and microbial interactions². Recent technological improvements have enabled the accurate detection of thousands of metabolites (collectively, the metabolome), adding highly informative downstream read-outs supporting genetic and transcriptomic signatures in the study of personalized medicine². With the abundance of such omics data, it will become possible to infer causal relationship between constitutive genetic variants and metabolite levels to accurately predict the likelihood of developing pathophysiologic signatures, as a normal individual progresses into a disease state.

To date, there have been several large-scale genome-wide association studies for metabolic traits (mGWAS)^{3,4,5,6,7,8,9,10} the largest (with broad non-targeted metabolomics) of which interrogated 7000 individuals of European ancestry and discovered 145 significant metabolomics quantitative trait loci (mQTLs)¹¹. While studies to date have uncovered hundreds of mQTLs, they have also faced certain limitations. First, they relied on imputed genotyping array data for the discovery of common variant mQTLs, where high-quality SNPs are in non-coding regions. More recently, next-generation sequencing (NGS) technologies have become more affordable and begun to identify protein-coding variants largely affecting metabolite levels, yet on a small scale of individuals or metabolites^12,13,14 and very recently on a larger scale¹⁵ (published while finalizing this manuscript). Second, metabolite detection platforms continue to rapidly improve, and deeper resolution can be gained today than previously possible. Third, as with lack of diversity in most GWAS studies¹⁶, most mGWAS to date have focused predominantly on populations of European descent^7,8,11,17, and recently, Asian and African descent¹⁸, yet little or no efforts have been described in Middle Eastern populations.

We present the first large-scale metabolomics exome-wide association study of a highly consanguineous Middle Eastern population, by combining 1,303 metabolites (from the most recent Metabolon DiscoveryHD4 platform) with deep whole-exome sequencing data of 614 Qataris for discovery, and imputed array data of another 382 Qataris for replication analysis. We integrate this data to discover loci affecting metabolites and metabolite ratios in this population, while fine-mapping loci to putative functional variants at or near sentinel SNPs (a sentinel SNP or sentinel metabolite refers to a lead SNP or a lead metabolite). Moreover, by leveraging elevated consanguinity in this population, we also discover rare variant loci associated with metabolite levels, underscoring the metabolic individuality of subjects from this ethnic population.

Results

Subject selection and genotyping

A total of 996 Qatari subjects were selected for this study (Table 1), of whom 614 were whole-exome sequenced (WES) for the discovery step, and 382 were array genotyped for replication. For the purpose of replication, both data sets were imputed using a reference set of 108 deeply covered, phased Qatari genomes. A total of 1,650,892 imputed exome variants were available for the analysis after Quality Control (see Methods). All samples were analyzed on Metabolon’s DiscoveryHD4 platform, where a total of 1303 metabolites were detected (Supplementary Data 1). After applying strict QC, 826 metabolites (including 249 unknowns) remained for the association analysis (Fig. 1 gives a schematic representation of the study design).

Table 1 Demographics (sample characteristics)

Full size table

Pre-discovery exploration of replication in Caucasians

As an investigation step prior to discovery, we attempted to replicate 145 previously known loci¹¹ in the discovery cohort. We found exact or proxy SNPs (SNPs in LD) for 101 of 145 loci in our dataset, of which sentinel metabolites in 80 loci were matched to metabolites in our cohort. For 13 loci, we replicated associations between the reported SNP and the reported metabolite at a Bonferroni p-value (p ≤ 0.05/80 = 6.25 × 10⁻⁴), and for another 15 loci, we replicated the association at the same/proxy SNP but with another metabolite (p < 0.05/(101 × 826) = 5.9 × 10⁻⁷) (Supplementary Data 2). In total, we replicated 28 loci (19.3% of 145 loci, 35% of 80 replicable loci)—including 11 of the 20 most significant loci¹¹.

21 common variant loci influence metabolites and ratios

To discover all metabolites associated with exome variants in 614 Qataris, 1.6 million imputed exome variants (272,061 SNPs after LD pruning) were tested for association with 826 metabolites (see Supplementary Fig. 1 for distribution of kinship-based heritability estimates for each super pathway). This step uncovered 3127 significant associations (Bonferroni p ≤ 2.2 × 10⁻¹⁰) (Supplementary Data 7), with an average inflation factor of 0.98 (range: 0.83–1.05). Those associations collapsed into 19 independent loci (see Methods), which we attempted to replicate (based on the sentinel SNP and the sentinel metabolite) in the imputed array data set (n = 382) and found that 17 of them replicated (p ≤ 0.05/19 = 0.0026) and one nominally replicated at p ≤ 0.05 (Table 2, Supplementary Datas 3–5).

Table 2 21 Unique locus-metabolite pairs, indicating 7 newly identified and novel loci, 10 loci fine mapped with new metabolite ratio associations, and 11 protein coding variants

Full size table

In addition to single-metabolite-variant associations, we examined each locus to identify significantly associated metabolite ratios¹⁹. To limit multiple testing, two approaches were used to examine associations of metabolic ratios. First, we computed the associations between SNPs (within 100 Kb of the sentinel SNP from single-metabolite association analysis) and the ratio of the sentinel metabolite to all remaining 825 metabolites. Second, for all SNPs where two metabolites had been nominally associated in the discovery phase (p ≤ 10⁻⁴) but in opposite directions (opposite beta signs), we computed the association of the given SNP to the ratio of that pair of metabolites. Both p-value and p-gain¹⁹ thresholds were used to find significant ratios. A total of 11 significant SNP to metabolite ratio associations were discovered with p ≤ 0.05/(826 × 18 + 826 × 272,061) = 2.2 × 10⁻¹⁰ and p-gain ≥(1/(2 × 0.05) × (826 × 18) = 1.48 × 10⁵ (see Methods). Seven of these resulted in metabolic fine-mapping of seven of the loci discovered in the single-metabolite phase (i.e. where a ratio replaced the single metabolite as the sentinel association), whereas the remaining four were new associations at loci not associated with single metabolites. Only three of those four replicated in the cohort of 382 individuals (Table 2, and Supplementary Datas 3–5), resulting in a total of 10 SNP to metabolite ratio associations.

Combined, we discovered 21 unique metabolite and metabolite ratio quantitative trait loci (mQTLs) in Qataris (Fig. 2, and Supplementary Figs. 2 and 3). The variance explained by these genetic variants ranges from the highest value of 31% (5-acetylamino-6-amino-3-methyluracil/1-methylxanthine with rs4646257 in NAT2 locus) to 7% (association of undecanedioate with c15p90683852 in SEMA4B locus) with an average of 14.8% (Table 2, Fig. 3). Of the 21 loci, 7 (31%) were unknown to studies that used older metabolomics platforms (studies prior to 2017) and are defined here as newly identified loci (four loci were concurrently identified in a study published while finalizing this manuscript¹⁵, and three loci are novel (Table 2, Supplementary Data 3). Five of those seven loci might be new due to the use of the new metabolomics platform, another two (TMPRSS11E and SEMA4B) were not discovered in ref. ¹¹, and other known loci were fine-mapped to new metabolites, and not reported elsewhere (as rs2147896 in PYROXD2 with N-methylpipecolate, and UGT3A1 with X-24348). We explored the frequencies of the sentinel SNPs in the Genome Aggregation Database (gnomAD)²⁰ (Supplementary Data 3), and found that among the seven loci, the sentinel SNP in two novel ones are not reported in the database (PHYHD1/NUP188, SEMA4B), and those in another two loci (SLC22A24, TTC38/PKDREJ) are rare in other populations.

All 10 reported metabolite ratio associations represent metabolic fine-mapping of loci discovered here or in previous studies. According to this cohort, those ratios associate more strongly than the single metabolites previously reported^11,15. These include the association of rs4646257 in NAT2 locus with the ratio 5-acetylamino-6-amino-3-methyluracil/1-methylxanthine, rs375811360 in NAT8 locus with 2-aminooctanoate/X-12511 (X-12511 is possibly 2-acetamidooctanoic acid as identified by Metabolon), rs3756669 in UGT3A1 locus with X-24348/pregn-steroid-monosulfate, rs28456 in FADS2 locus with 1-(1-enyl-palmitoyl)-2-arachidonoyl-GPC(P-16:0/20:4)/X-24438 (X-24428 is PC(P-16:0/20:3) as identified by Metabolon), rs6690449 in THEM4 locus with X-18921/X-23680 and rs1165196 in SLC17A1 locus with X-12824/X-16087 (X-12824 is hexanoylglutamine as identified by Metabolon) (see expanded loci associations–Supplementary Data 5). Notably, the variance explained by ratios was much higher than that explained by any single metabolite for the same locus (Table 2); for example, for the NAT2 locus, the variance explained by the ratio of metabolites is 3.9-fold greater than that explained by the single metabolite; similarly, 1.9-fold greater in the FADS2, UGT3A1 and THEM4 loci, 1.5-fold greater in SLC22A24 and TTC38/PKDREJ loci, and 1.3-fold greater in the NAT8 locus (comparison of ratios to single metabolites—Fig. 4).

We investigated the 21 loci for known eQTLs and disease or pharmacological associations using several databases (biological details are in Supplementary Data 6): GTEx portal (version 2.1, Build #201), OMIM database²¹, Orphanet disease database, CHEMBL targets database, PharmaGKB²² and SNIPA (see URLs in Methods). We found that seven sentinel SNPs are in genes encoding enzymes or transporters. Additionally, 16 sentinel SNPs are known eQTLs, including 5 at novel and newly identified loci (PHYHD1/NUP188, TTC38/PKDREJ, TMPRSS11E, SPTLC1P4/AL591893.1, CCBL2). Moreover, 10 loci contain genes linked to diseases, including 4 of the newly identified loci; namely, NUP188 has a role in heterotaxy²³, SEMA4B in hypoxia and lung cancer²⁴, CCBL2 in neurodegenerative diseases, and TMPRSS11E in squamous cell carcinomas²⁵. Finally, genes in 4 loci have been previously studied for effect on drugs.

Fine-mapping identifies functional variant associations

The availability of WES data in this study population allowed us to investigate if specific loci could be explained by functional (protein-altering) variants. Among the 21 loci, 11 (51%) harbored SNPs that were protein-altering or splice variants (Table 2, Supplementary Data 3). Two of these are among the novel or newly identified loci: the splice variant c.3662-4 A > G (rs2302811) in NUP188 (PHYHD1/NUP188 locus), association with X-22145 (2′-O-methyluridine) and the missense variant p.Ala12Val (rs117135869) in TTC38 (TTC38/PKDREJ locus) association with X-22162/X-24513, and the remaining nine are in previously reported mQTLs (Table 2). More importantly is the missense p.Cys121Gly (rs3756669) in UGT3A1 (Fig. 5a) which was found to be associated with undetected levels of X-24348 (these appear as missing values in the raw unprocessed metabolite levels) in the subjects which are homozygotes for the mutation, thus a potential loss of function in a pathway possibly related to the metabolite production²⁶ (see Discussion). Additionally, the association of the missense variant p.Gly209Ser (rs1799958) in ACADS (Fig. 5b) to ethylmalonate levels (concurrently reported¹⁵) is of clinical implication due to the association of the mutated allele with mild SCAD deficiency (MIM: 606885.0007).

Rare variants associated with metabolites in Qataris

In addition to common-variant mQTL discovery, we hypothesized that with a modest sample size, one may find rare functional variants influencing metabolite levels in homozygous state shared by more than one individual due to high levels of consanguinity. We selected high-quality protein-altering rare variants (MAF < 5%; n = 21,933 SNPs, see Methods). First, we performed gene-based burden testing for all genes harboring at least one of these rare variants (n = 9823 genes; Bonferroni p ≤ 6.16 × 10⁻⁹); and second, we tested single variant associations for SNPs with at least two homozygotes for the rare variant (n = 2660 SNPs in 2119 genes; Bonferroni p ≤ 2.27 × 10⁻⁸). In this analysis, we focused on rare, homozygous variants shared by two or more individuals, whose metabolite values were at either tail of the distribution (highest or lowest). After stringent QC and filtering, we discovered two genes with rare variants influencing metabolite levels through gene-burden analysis. In contrast to burden testing, we identified 10 variants significantly influencing metabolite levels through single-variant testing (Table 3, Fig. 6, Supplementary Fig. 5).

Discussion

We describe the first large-scale (n = 996 individuals) metabolite association study in a Middle Eastern population, combining deep WES data with an updated metabolomics platform. Altogether, we discover and replicate 21 common variant mQTLs. Amongst those, seven are novel or newly identified loci (associations that have been concurrently discovered¹⁵ and completely new ones, as indicated in Table 2), and 10 represent fine-mapping of previously reported loci to new metabolite ratios that associate more significantly than previously discovered single metabolites. Importantly, by having deep exome data, we fine-map a total of 11 loci to candidate protein-altering variants, with biological implications described below. Further, in the rare variant spectrum, we present 12 novel mQTL associations, all not previously reported.

Being able to replicate 35% of the 80 replicable loci reported in the largest mGWAS study to date¹¹ with one tenth the sample size may be attributable to the high levels of consanguinity in our population, which allows for sufficient numbers of alternate allele homozygotes to be present for metabolic associations. Loci replicated among different ethnicities are likely to belong to pathways common to human metabolism (7 out of the 10 most significant loci reported in¹¹—PYROXD2, ACADS, NAT8, FADS2, SLCO1B1, SULT2A1, and UGT1A1—being among the top associations in our study).

Several of the newly discovered associations may reveal putative biological links between SNPs and metabolites. One such example is the association of rs34109652 in TMPRSS11E with X-11491 (tentatively identified by Metabolon as the bile acid deoxycholic acid glucuronide). This particular SNP had been reported as an eQTL for UGT2B15 (GTEx version 4.1, Build #201), a gene at which CpG methylation was also significantly associated with X-11491 levels²⁷, consistent with the gene’s putative function of conjugating bile acids (GenAtlas(http://genatlas.medecine.univ-paris5.fr/fiche.php?n=6643)). Another biological association is that of the missense variant rs117135869 (p.Ala12Val) in TTC38 (TTC38/PKDREJ locus) associating significantly to X-22162, and more significantly, to the ratio of this metabolite to X-24513. TTC38 (Tetratricopeptide Repeat Domain 38) expression is significantly positively correlated with age²⁸, and levels of both metabolites are also significantly associated with age by regression analysis in our cohort (see Methods). Additionally, X-24513 is significantly partially correlated with C-mannosyltryptophan (also known as C-glycosyltryptophan) (see Methods), which was reported to be associated with aging²⁹. More interestingly, the fact that this locus encompasses PKDREJ (Polycystic Kidney Disease and Receptor for Egg Jelly related protein) gene (regional association plot—Supplementary Fig. 3), a homolog of PKD genes associating with kidney disease³⁰, and that X-24513 might be similar in characteristics to C-mannosyltryptophan, reportedly elevated in chronic kidney disease³¹, highlights the importance of investigating the biological implication of this association in relation to both kidney diseases and ageing. Finally, the association of rs7530513 in the aminotransferase CCBL2 (Cysteine-S-conjugate beta-lyase 2, also known as KYAT3 (Kynurenine–oxoglutarate transaminase 3) with imidazole lactate is another biologically relevant association; CCBL2 enzyme [EC2.6.1.7] has a transaminase activity towards histidine (Uniprot³²) and it has been reported that histidine transaminase [EC2.6.1.38], takes L-histidine and 2-oxyglutarate as substrates and produces imidazole pyruvate that is later converted to imidazole lactate³³. However, we are unable to confirm that the same histidine transaminase is the one involved in both processes.

A main advantage of using WES data in an association analysis is the enhanced resolution at associated loci that could reveal protein-altering variants affecting metabolite levels. We were able to identify such sentinel SNPs for over half of all loci (11 of 21), possibly revealing functional relationships between genes harboring these variants and their associated metabolites. One such example is the fine-mapping of a previously identified signal in the 3′UTR of ACADS to a missense mutation (rs1799958, p.Gly209Ser) in exon 6 (Fig. 5b). This mutation (MIM 606885.0007) has been reported independently as causing mild SCAD deficiency, with the mutant allele predicted to have 85% of wild-type activity³⁴. Further, the population-frequency of this variant (MAF ~25% in GnomAD, 37.8% in Qataris) is consistent with it having been the mild functional mutation in previously reported association signals between this locus and levels of both ethylmalonate and butyrylcarnitine, metabolites whose levels are known to be perturbed in SCAD deficiency³⁵.

Another important finding is at the UGT3A1 locus, where, in contrast to previous studies that reported an association of the intergenic SNP rs10491431 with steroid levels, we uncovered an association of the missense mutation rs3756669 (p.Cys121Gly) in UGT3A1 with the unknown metabolite X-24348 and more strongly with the ratio of X-24348 to pregn-steroid-monosulfate (Figs. 4 and 5). This unknown has shown significant partial correlation to two other metabolites that also significantly correlate with steroids (Supplementary Fig. 4a, see Methods). The missense mutation therefore possibly interferes with native gene function related to the production of this metabolite, since all minor homozygotes for this SNP had missing values of X-24348 (Supplementary Fig. 2) (p-value = 5.94 × 10⁻¹² for random occurrence of missing values, see Methods), suggesting the amino-acid substitution leads to loss of function of the gene. This hypothesis is supported by previous functional evidence demonstrating that this mutation leads to diminished UGT3A1 glucuronidation activity²⁶, especially of bile acids and estradiol. Metabolon has also recently studied this unknown and found that it might be a form of N-acetylglucosamine modification of a pregnandiol, yet cannot be confirmed at the time being. Together, the data suggest the unknown metabolite may itself be a conjugated steroid, which requires this conjugation to appear at detectable levels in the blood. Notably, the mutation appears to have population-specific allele distribution, having been observed in up to 40% of Asians, 15% of Europeans, but only 3% of Africans²⁶—a distribution which is maintained in our study cohort (overall Qatari MAF 12.8%), being present in Q1 and Q2 Qataris (Bedouin and Persian/South Asian), but not in Q3, who are of African descent³⁶.

In addition to identifying functional SNPs, the use of the updated metabolomics platform (DiscoveryHD4) enabled the refinement of previously reported loci to new metabolites or metabolite ratios in the same or in different biological pathway from those originally reported¹¹. For example, NAT2 encodes an arylamine transferase that controls the conversion of paraxanthine to 5-acetylamino-6-amino-3-methyluracil³³, and its locus has previously been reported to be associated with 1-methylxanthine. In this study, we discovered a stronger association with the ratios 5-acetylamino-6-amino-3-methyluracil/1-methylxanthine and 5-acetylamino-6-amino-3-methyluracil/paraxanthine, which are consistent with the known activity of the NAT2 enzyme in caffeine metabolism (see network of metabolites linked to NAT2, and KEGG pathway in Supplementary Fig. 4b, c). Being the top most significant association in this population in comparison to the top most ones previously reported^11,15, it might have several biological implications for the studied population. It is thus worthy to note that the two missense SNPs at this locus are identified in OMIM as associated with the rate of acetylation, presenting a possible mechanism for affecting drug metabolism²²; specifically, slow acetylation was reported for individuals harboring either of the two missense SNPs rs1799930 and rs1801280 (MIM: 612182.0001 and 612182.0002). Additionally, we discovered associations between SLC17A1 and the ratio hexanoylglutamine/X-16087. Previously, rs1185567 in SLC17A3 was reported to be associated with steroid levels¹¹. However, since the SLC17 family are vesicular glutamate transporters³⁷, our association signal, linking the missense variant p.Thr269Ile (rs1165196) to a ratio containing hexanoylglutamine may reflect a direct biological relationship. We also discovered mQTLs associating known loci to metabolites in pathways different from those previously reported. Once such example is the association between the missense variant rs2147896 (p.Met461Thr) in PYROXD2 with levels of N-methylpipecolate. This gene had previously been linked to levels of urinary trimethylamine¹⁸ and dimethylamine⁵, yet interestingly not found significantly associated with any of the mentioned metabolites in previous large scale studies^11,15. Finally, our study expands the associations of several loci with newly identified metabolites or ratios as in the NAT2 locus and FADS2 locus among others (Supplementary Data 5). One example that supports the functional importance of the discovered loci is the SLCO1B1 locus. Its sentinel SNP, rs4149056, was previously associated with the ratio isoleucine/X-11529¹¹, and other unknowns, and found in our study to be associated with the bile acids glycochenodeoxycholate-glucuronide(1) (which was also recently found to be the retired form of X-11529 by Metabolon), and glycocholenate sulfate. That supports the functionality of SLCO1B1 as a bile acid transporter, suggesting that the mutation p.Val174Ala alters the gene’s native function³⁸.

In addition to common mQTLs, we discovered a total of 12 rare mQTLs—10 by single-variant analysis and 2 by gene-burden analysis—with interesting biological implications (Table 3 and Supplementary Fig. 5). First, the ACAN gene encodes a major component of the extracellular matrix, lending important biomechanical properties of cartilage, which explains its role in diseases such as osteoarthritis (MIM: 155760). We discovered an association of a missense variant p.Ala766Thr (rs150988100) in ACAN with a steroid (X-12844, tentatively identified by Metabolon as glucuronidated steroid); it is well-established that circulating steroid levels are linked to inflammation in joints and diseases such as arthritis^39,40,41,42. Additional investigations would be needed to uncover the relation between the variant and the metabolite since one of the two subjects having this mutation have musculoskeletal problems. Similarly, the association of rs377301648 in ZNF133 with a 3-methyl-2-oxovalerate, a metabolite in the branched-chain amino-acid pathway might be due to the role of the gene in osteoblast differentiation⁴³ and the previously reported involvement of 3-methyl-2-oxovalerate in osteoarthritis⁴⁴. Additionally, association of the c12p11506114 mutation in PRB1 with mannose (Fig. 6) might be due to the involvement of this glycosylated, proline-rich protein in the salivary secretion pathway⁴⁵, (Michael W King, PhD|© 1996–2016 themedicalbiochemistrypage.org)⁴⁶.

Table 3 12 novel functional rare variant mQTLs

Full size table

Another interesting rare variant association is that between rs56332208 in OTOF with retinol (vitamin A). This might be a disease-relevant link because retinoic acid mediates the regeneration of specialized mechano-sensory hair cells in the inner ear that capture auditory and balance sensory inputs, and which die after acoustic trauma, ototoxic drugs or aging diseases, leading to progressive hearing loss⁴⁷. Previous studies have also described mutations in OTOF causing recessive neurosensory non-syndromic deafness in patient cohorts from many different ethnicities via the gene’s role in exocytosis of inner and outer hair cells^{48,49,50,51,52}. Thus, our study may provide a mechanistic link between this gene and hair cell development via modulation of retinol levels. Finally, burden-testing revealed mQTLs in AASDH and METTL7B associated with decreased levels of thyroxine and androsterone sulfate respectively. The AASDH gene plays an active role in the pipecolate pathway in which thyroxine is a by-product⁵³. Thus, these mutations may impact protein function leading to significant reduction of thyroid hormone and metabolism. Regarding the METTL7B mQTL, one of the three mutations in this gene (rs115687886) is a nonsense mutation, suggesting the other two are also loss-of-function mutations as they influence the metabolite in the same direction. Increases in METTL7B expression had been previously observed in patients with acute respiratory distress syndrome, involving tissue injury and inflammation⁵⁴, whereas adrenal androgens as androsterone sulfate have been observed to decrease in stress and inflammation^39,40,41,42. The fact that one of the two subjects with the rs199581976 mutation had recurrent chest infections and pneumonia and the other subject had bronchial asthma, recurrent pneumonia, and right lung lower lobe collapse, supports the functionality of the gene, and thus provides another possible link between a rare variant and disease via metabolite-level modulation.

To summarize, our study revealed 21 mQTLs in Qataris, among which 7 are unknown to studies that used older metabolomics platforms, 10 are metabolically fine-mapped with new metabolite ratios, and 11 in which the sentinel SNP was at or in complete LD with protein-altering variants. We also discovered 12 novel functional rare variant mQTLs that are likely specific to this ethnic population. We believe that it is important to replicate rare variant associations, yet the low frequency of these variants and absence of similar cohorts makes it challenging. This study demonstrates the efficiency of using WES for mQTL discovery, which could be more convenient compared to the more expensive WGS, while providing deep coverage of protein coding variants, for fine-mapping of common mQTLs and being suitable for rare variant mQTL discovery. The use of WES is strongly supported by the ability to replicate a fair fraction of the mQTLs in Caucasians (19% of all loci and 35% of replicable loci). Finally, this study is the largest and the first of its kind in a Middle Eastern population, where we show that studies in consanguineous populations offer a large insight with modest sample sizes and have the potential to reveal loci linked to disease and pharmacological/drug targets important to precision medicine in this region of the world.

Methods

Study cohort

Human subjects were recruited and written informed consent was obtained at Hamad Medical Corporation (HMC) and HMC Primary Health Care Centers in Doha, Qatar and approved by the Institutional Review Boards of Hamad Medical Corporation and Weill Cornell Medicine in Qatar. Briefly, a total of 614 subjects were recruited for exome sequencing, and another 382 subjects for array genotyping. Subjects were included on the basis of being 3-generation Qataris (four grandparents born in Qatar) and being healthy or diabetics, above the age of 30 years. Sample demographic characteristics are given in Table 1.

Exome sequencing

DNA of 614 subjects was extracted from blood using the QIAamp DNA Blood Maxi Kit (Qiagen Sciences Inc, Germantown, MD) and subjected to exome sequencing on the Illumina HiSeq 2000 platform using standard methods. Each subject was sequenced to a minimum mean depth of 70X. Genotypes were generated using the GATK Best Practices workflow⁵⁵ (Whole exome data for a set of recently received 64 samples were merged with those of the 550 samples after calling variants using GATK on each and finding that the majority of SNPs detected in both sets overlap. We limited the analysis to variants in both sets by setting a genotype call rate of ≥98% for common variant analysis and 90% for rare variant analysis). Detailed preparation methods and genotypes are available in⁵⁶. The DNA of 382 subjects was extracted and subjected to array genotyping using Illumina Omni 2.5M array kits.

Imputation and filtering

Exome and array data were imputed after being filtered (MAF ≥ 0.05, p_HWE > 10⁻⁶, genotype call rate ≥98%) based on phased 108 Qatari whole genomes as a reference panel, using shapeit⁵⁷ and Impute2 software packages^58,59. Throughout all the manuscript and supplementary data sets/information, two unique notations of “c”#“p”# and #:# for indicating a chromosome, position pair (for example c1p100 and 1:100) are used to distinguish original exome SNPs and imputed exome SNPs, respectively. A total of 22 million SNPs were imputed into 614 Qatari Exomes. For common-variant association analysis, 1,650,892 SNPs were used, after removing SNPs with imputation quality R² < 0.5, MAF < 0.05, genotype call rate <98%, and pHWE <10⁻⁶. For rare variant analysis, the methods are described separately below. To define a Bonferroni threshold for finding significant associations, we used the pruned set of SNPs at LD 0.8 (using plink command indep-pairwise 50 5 0.8) that resulted in a total of 272,061 SNPs. Replication was performed in array-genotyped data from a separate set of 382 samples, where imputation using the same 108 Qatari genomes produced a total of 11 million high-quality SNPs.

Metabolomics data

Serum samples were prepared for metabolomics analysis as follows: 200 μl of serum were aliquoted, barcoded and transported on dry ice to Metabolon Inc. for analysis. Some samples were found hemolyzed, and the degree of hemolysis was recorded for each sample (according to a hemolysis chart as given by Metabolon Inc.), for correcting for its effect on metabolomics measurements. The Metabolon DiscoveryHD4 platform was used (details are given in Supplementary Note 1). This platform utilized a Waters ACQUITY ultra-performance liquid chromatography (UPLC) and a Thermo Scientific Q-Exactive high resolution/accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer operated at 35,000 mass resolution. A total of 1303 metabolites were measured on that platform. Outlier metabolite measurements (3 standard deviations larger than the mean) were replaced by a missing value to avoid biasing the results. Metabolites with more than 20% missing values were removed from the data. A total of 826 metabolites (including 249 unknown metabolites) survived as high quality and observed in >80% of all individuals. Metabolite measurements were log-scaled and z-score normalized. Since samples were collected over different periods of time, and there was no adopted fasting criterion, we addressed that issue by investigating whether there are batch effects in the PCA of metabolic data, and did not find any grouping for any of the 614 exome samples or the 382 array samples. Several unknown metabolites were investigated by the aid of Metabolon, and which revealed their identities (see below).

Identification of unknown metabolites by Metabolon

Identification of tentative structural features for unknown biochemicals incorporates a detailed analysis of mass spec data, i.e., gathering information such as the accurate monoisotopic mass, the elution time and fragmentation pattern of the primary ion, and correlation to other molecules. The accurate monoisotopic mass is used to identify a likely structural formula for the unknown biochemical, which is then used to search against chemical structure databases (e.g., ChemSpider, SciFinder). When a candidate structure fits the accurate monoisotopic mass and fragmentation data, an authentic standard is commercially purchased or synthesized (when possible). Conformation of a proposed structure is based on a match to three primary criteria, including co-elution with the unknown molecule of interest, and a high degree match to both the accurate monoisotopic mass and fragmentation pattern. When a standard is not available to confirm the identity of the unknown biochemical but sufficient data exist to support a high degree of confidence in its identity, the unknown biochemical may be retired for a named structure that is differentiated from other named metabolites confirmed with authentic standards by the addition of an asterisk after the biochemical name. When a high degree of confidence in the identity of the unknown molecule is not obtained, the molecule retains its unnamed status as designated by an 'X-' in the molecule name. Refer to Supplementary Note 2 for the particular identification information on each of the identified unknown metabolites, and Supplementary Fig. 6 for the comparison of MS² fragmentation spectrum and Extracted Ion Chromatogram of the candidate and unknown metabolite that were structurally confirmed by Metabolon.

mGWAS analysis

All associations were performed on imputed exome or imputed array genotype data. P-values and effect sizes were calculated using functions from both GenABEL and ProbABEL packages in R (version 3.1.2) that were used for computing associations between SNPs and metabolite/ratio levels (after being pre-processed as described in the previous sections) while correcting for age, gender, BMI, T2D, hemolysis, population stratification (using two PCA components) and kinship (for family relatedness). The “polygenic” function in GenABEL was used for correcting metabolite levels for covariates and kinship matrix, and regression analysis using “mmscore” function was used to find association between the residuals obtained from this function and the SNPs using an additive inheritance model. Heritability estimates are based on “esth2” values resulting from this function. Kinship matrix was computed based on the genotype data using the “ibd” function in GenABEL. The mean inflation factor for all metabolites was 0.98. Percent of variance is calculated as follows: r² = χ²/(N − 2 + χ²), where χ² = (β/s.e.)², β and s.e. are the β and the standard error of the β, as obtained from the regression results, and N is the number of samples involved in this association. Regional association plots were produced using an in-house tool similar to that of (http://snipa.helmholtz-muenchen.de/snipa/), yet based on LD of subjects in the exome data used here.

Loci and sentinel SNPs

Association results were divided into 500 Kbp blocks, and in each of these the sentinel SNP and sentinel metabolite are defined according to the SNP-metabolite association with the highest significance. Those define the mQTLs. In the case where a locus had two significant SNPs not in LD (i.e., LD < 0.5) and lying in two different genes, it was broken down into two loci (as in the case of THEM4 gene and SPTLC1P4/AL591893.1, where those two loci are at LD = 0.3). However, where there are two SNPs not in LD (LD < 0.5) but are in the same gene, we did not split the locus and we report LD values for those SNPs (in NAT2 and FADS2 genes) in Supplementary Data 5. All functional variants—missense and splice variants reported are in LD with the sentinel SNP of the locus except for the NAT2 locus where the two missense SNPs are not in LD with the sentinel SNP.

Ratios analysis

p-gain was introduced in ref. ¹⁹ as an “ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone”. It is calculated as: p-gain = min(p-val(m1), p-val(m2))/p-val(m1/m2), given two metabolites m1 and m2, and the p-values (p-val) of the association of the metabolite/ratio to the same SNP. The p-gain threshold according to¹⁹ assumes a level of significance of α/B, where a type I error rate of α = 0.05 is used, and the critical value for the p-gain is B/(2⋅α), i.e. 10⋅B. Thus for Bonferroni correction of B tests, the uncorrected critical value of 10 can be multiplied by the number of tests B. Accordingly, we set the value of 10 × 826 × 18 (10×number of metabolites×number of loci obtained from single metabolites) as an approximate threshold (we took the maximum of the number of tests of the first and the second methods where the first method (as mentioned earlier) considers the ratio of the sentinel metabolite against all the rest of 825 metabolites, tested with SNPs in 100 KB of the sentinel SNP of this locus in each of the loci discovered by single-metabolite analysis, and the second method considers the ratio of two metabolites associated with the same SNP but with opposite beta signs, in this case 13,281 associations were tested).

Fisher exact test for randomness of missing values

This was used to detect whether the missing metabolite values from certain genotype groups (AA, AB, and BB) were missing by chance, when studying the effect of a missense on the metabolite level (after making sure the missingness was not due to removing outliers in the pre-processing of metabolite levels). The two inputs for the Fisher’s exact test are: (a) the number of missing values from each genotype group, and (b) the total number of samples in each genotype group. fisher.test method in R was used for this calculation. The low p-value indicates a low probability having these missing values by chance.

Metabolite regression analysis and partial correlations

To find associations between selected metabolites and other phenotypes as age (as in the case of X-22162 and X-24513), the metabolite level is regressed against age, gender, BMI, hemolysis, T2D, and population stratification (for inclusion of ethnicity) for all 996 individuals using lm function in R. Bonferroni p-value is used to report a significant association with a phenotype (p ≤ 0.05/826). Partial correlations between two metabolites are calculated using the GeneNet package in R, and significant partial correlations are those which pass a Bonferroni p-value of (p ≤ 0.05/(826 × 825/2)).

Rare variants analysis

High-quality rare variants were selected from the original non-imputed exome data, with MAF ≤ 5%, p_HWE > 10⁻⁶, and genotype call rate ≥90%. Single-variant analysis was done using GenABEL package in R (as mentioned above for common variants), and burden test was done using seqMeta package in R (as in refs. ^60,61). For burden tests, we accounted for the number of genes (n = 9823 genes; Bonferroni p ≤ 6.16 × 10⁻⁹), and for single-variant analysis, we accounted for the number of SNPs where we tested single-variant associations for SNPs with at least two homozygotes for the rare variant (n = 2660 SNPs in 2119 genes; Bonferroni p ≤ 2.27 × 10⁻⁸). All SNPs in the reported 12 rare variant mQTLs have a call rate of 100% except for c12p11506114 with a call rate of 97%. A filtering step was done by visual inspection of genotype-metabolite boxplots of significant associations to consider only associations where metabolite values of the minor homozygotes are at the extreme tail of the metabolite distribution (i.e., they should be the lowest or highest metabolite values for that metabolite).

URLs of databases used for annotation

GTEx portal (version 2.1, Build #201)[www.gtexportal.org], OMIM diseases database [www.omim.org], Orphanet disease database [http://www.orpha.net] CHEMBL targets database [www.ebi.ac.uk/chembl], PharmaGKB [www.pharmgkb.org], SNIPA [http://snipa.helmholtz-muenchen.de/snipa/], GenAtlas (http://genatlas.medecine.univ-paris5.fr/fiche.php?n=6643), and GnomAD: http://gnomad.broadinstitute.org/.

Data availability

All information on metabolites are in Supplementary Data 1, expanded association results for 21 common variant mQTLs are in Supplementary Data 5, and the detailed biological annotations/interpretations of associations are in Supplementary Data 6, list of all associations (p < = 1.4 × 10⁻⁷) are in Supplementary Data 7. All plots are also available in Supplementary Information and detailed information on Metabolon’s method for identification of unknowns is available in the Supplementary Note. Exome data used in this project were selected from a pool of samples of which more than 1000 samples are deposited in SRA accessions SRP060765, SRP061943, and SRP061463, accessible online at http://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP060765%2CSRP061943%2CSRP061463&go=go). (SRA accession SRP061943).

References

Yousri, N. A. et al. Long term conservation of human metabolic phenotypes and link to heritability. Metabolomics 10, 1005–1017 (2014).
Article CAS PubMed PubMed Central Google Scholar
Beger, R. D. et al. Metabolomics enables precision medicine: “A white paper, community perspective”. Metabolomics 12, 149 (2016).
Article PubMed PubMed Central Google Scholar
Gieger, C. et al. Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet. 4, e1000282 (2008).
Article PubMed PubMed Central Google Scholar
Illig, T. et al. A genome-wide perspective of genetic variation in human metabolism. Nat. Genet. 42, 137–141 (2010).
Article CAS PubMed Google Scholar
Nicholson, G. et al. A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection. PLoS Genet. 7, e1002270 (2011).
Article CAS PubMed PubMed Central Google Scholar
Suhre, K. et al. A genome-wide association study of metabolic traits in human urine. Nat. Genet. 43, 565–569 (2011).
Article CAS PubMed Google Scholar
Draisma, H. H. et al. Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels. Nat. Commun. 6, 7208 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kastenmuller, G., Raffler, J., Gieger, C. & Suhre, K. Genetics of human metabolism: an update. Hum. Mol. Genet. 24, R93–r101 (2015).
Article PubMed PubMed Central Google Scholar
Yet, I. et al. Genetic influences on metabolite levels: a comparison across metabolomic platforms. PLoS ONE 11, e0153672 (2016).
Article PubMed PubMed Central Google Scholar
Kettunen, J. et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA. Nat. Commun. 7, 11122 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Shin, S. Y. et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).
Article CAS PubMed PubMed Central Google Scholar
Demirkan, A. et al. Insight in genome-wide association of metabolite quantitative traits by exome sequence analyses. PLoS Genet. 11, e1004835 (2015).
Article PubMed PubMed Central Google Scholar
Guo, L. et al. Plasma metabolomic profiles enhance precision medicine for volunteers of normal health. Proc. Natl Acad. Sci. USA 112, E4901–E4910 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yazdani, A., Yazdani, A., Liu, X. & Boerwinkle, E. Identification of rare variants in metabolites of the carnitine pathway by whole genome sequencing analysis. Genet. Epidemiol. 40, 486–491 (2016).
Article PubMed PubMed Central Google Scholar
Long, T. et al. Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat. Genet. 49, 568–578 (2017).
Article CAS PubMed Google Scholar
Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Raffler, J. et al. Genome-wide association study with targeted and non-targeted nmr metabolomics identifies 15 novel loci of urinary human metabolic individuality. PLoS Genet. 11, e1005487 (2015).
Article PubMed PubMed Central Google Scholar
Rueedi, R. et al. Genome-wide association study of metabolic traits reveals novel gene-metabolite-disease links. PLoS Genet. 10, e1004132 (2014).
Article PubMed PubMed Central Google Scholar
Petersen, A. K. et al. On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies. BMC Bioinformatics. 13, 120 (2012).
Article PubMed PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
McKusick, V. A. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).
Article CAS PubMed PubMed Central Google Scholar
Whirl-Carrillo, M. et al. Pharmacogenomics knowledge for personalized medicine. Clin. Pharmacol. Ther. 92, 414–417 (2012).
Article CAS PubMed PubMed Central Google Scholar
Fakhro, K. A. et al. Rare copy number variations in congenital heart disease patients identify unique genes in left-right patterning. Proc. Natl Acad. Sci. USA 108, 2915–2920 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Jian, H., Liu, B. & Zhang, J. Hypoxia and hypoxia-inducible factor 1 repress SEMA4B expression to promote non-small cell lung cancer invasion. Tumour Biol. 35, 4949–4955 (2014).
Article CAS PubMed Google Scholar
Lang, J. C. & Schuller, D. E. Differential expression of a novel serine protease homologue in squamous cell carcinoma of the head and neck. Br. J. Cancer 84, 237–243 (2001).
Article CAS PubMed PubMed Central Google Scholar
Mackenzie, P. I. et al. Identification of UDP glycosyltransferase 3A1 as a UDP N-acetylglucosaminyltransferase. J. Biol. Chem. 283, 36205–36210 (2008).
Article CAS PubMed PubMed Central Google Scholar
Petersen, A. K. et al. Epigenetics meets metabolomics: an epigenome-wide association study with blood serum metabolic traits. Hum. Mol. Genet. 23, 534–545 (2014).
Article CAS PubMed Google Scholar
van den Akker, E. B. et al. Meta-analysis on blood transcriptomic studies identifies consistently coexpressed protein-protein interaction modules as robust markers of human aging. Aging Cell 13, 216–225 (2014).
Article PubMed Google Scholar
Menni, C. et al. Metabolomic markers reveal novel pathways of ageing and early development in human populations. Int. J. Epidemiol. 42, 1111–1119 (2013).
Article PubMed PubMed Central Google Scholar
Veldhuisen, B., Spruit, L., Dauwerse, H. G., Breuning, M. H. & Peters, D. J. Genes homologous to the autosomal dominant polycystic kidney disease genes (PKD1 and PKD2). Eur. J. Hum. Genet. 7, 860–872 (1999).
Article CAS PubMed Google Scholar
Sekula, P. et al. A Metabolome-Wide Association Study of kidney function and disease in the general population. J. Am. Soc. Nephrol. 27, 1175–1188 (2016).
Article CAS PubMed Google Scholar
UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2014).
Article Google Scholar
Kanehisa, M. & Goto, S. Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Corydon, M. J. et al. Role of common gene variations in the molecular pathogenesis of short-chain acyl-CoA dehydrogenase deficiency. Pediatr. Res. 49, 18–23 (2001).
Article CAS PubMed Google Scholar
van Maldegem, B. T. et al. Clinical, biochemical, and genetic heterogeneity in short-chain acyl-coenzyme A dehydrogenase deficiency. JAMA 296, 943–952 (2006).
Article PubMed Google Scholar
Hunter-Zinck, H. et al. Population genetic structure of the people of Qatar. Am. J. Hum. Genet. 87, 17–25 (2010).
Article CAS PubMed PubMed Central Google Scholar
Omote, H., Miyaji, T., Hiasa, M., Juge, N. & Moriyama, Y. Structure, function, and drug interactions of neurotransmitter transporters in the postgenomic era. Annu. Rev. Pharmacol. Toxicol. 56, 385–402 (2016).
Article CAS PubMed Google Scholar
Xiang, X. et al. Effect of SLCO1B1 polymorphism on the plasma concentrations of bile acids and bile acid synthesis marker in humans. Pharmacogenet. Genomics 19, 447–457 (2009).
Article CAS PubMed Google Scholar
Cutolo, M. Androgens in rheumatoid arthritis: when are they effectors? Arthritis Res. Ther. 11, 126 (2009).
Article PubMed PubMed Central Google Scholar
Cutolo, M. et al. New roles for estrogens in rheumatoid arthritis. Clin. Exp. Rheumatol. 21, 687–690 (2003).
CAS PubMed Google Scholar
Cutolo, M. et al. Androgens and estrogens modulate the immune and inflammatory responses in rheumatoid arthritis. Ann. N. Y. Acad. Sci. 966, 131–142 (2002).
Article ADS CAS PubMed Google Scholar
Cutolo, M., Sulli, A., Pizzorni, C., Craviotto, C. & Straub, R. H. Hypothalamic-pituitary-adrenocortical and gonadal functions in rheumatoid arthritis. Ann. N. Y. Acad. Sci. 992, 107–117 (2003).
Article ADS CAS PubMed Google Scholar
Stains, J. P. & Civitelli, R. Genomic approaches to identifying transcriptional regulators of osteoblast differentiation. Genome Biol. 4, 222 (2003).
Article PubMed PubMed Central Google Scholar
Adams, S. B. Jr. et al. Global metabolic profiling of human osteoarthritic synovium. Osteoarthritis Cartilage 20, 64–67 (2012).
Article PubMed Google Scholar
Stubbs, M. et al. Encoding of human basic and glycosylated proline-rich proteins by the PRB gene complex and proteolytic processing of their precursor proteins. Arch. Oral Biol. 43, 753–770 (1998).
Article CAS PubMed Google Scholar
Mandel, I. D. in The Glycoconjugates (ed. Horowitz, M. I. & Pigman, W.) 153–179 (Academic Press, 1977).
Rubbini, D., Robert-Moreno, A., Hoijman, E. & Alsina, B. Retinoic acid signaling mediates hair cell regeneration by repressing p27kip and sox2 in supporting cells. J. Neurosci. 35, 15752–15766 (2015).
Article CAS PubMed Google Scholar
Stelzer, G. et al. The GeneCards suite: from gene data mining to disease genome sequence analyses. Curr. Protoc. Bioinform. 54, 1.30.1-1.30.33 (2016).
Article Google Scholar
Yasunaga, S. et al. A mutation in OTOF, encoding otoferlin, a FER-1-like protein, causes DFNB9, a nonsyndromic form of deafness. Nat. Genet. 21, 363–369 (1999).
Article CAS PubMed Google Scholar
Mirghomizadeh, F. et al. Substitutions in the conserved C2C domain of otoferlin cause DFNB9, a form of nonsyndromic autosomal recessive deafness. Neurobiol. Dis. 10, 157–164 (2002).
Article CAS PubMed Google Scholar
Adato, A., Raskin, L., Petit, C. & Bonne-Tamir, B. Deafness heterogeneity in a Druze isolate from the Middle East: novel OTOF and PDS mutations, low prevalence of GJB2 35delG mutation and indication for a new DFNB locus. Eur. J. Hum. Genet. 8, 437–442 (2000).
Article CAS PubMed Google Scholar
Choi, B. Y. et al. Identities and frequencies of mutations of the otoferlin gene (OTOF) causing DFNB9 deafness in Pakistan. Clin. Genet. 75, 237–243 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hallen, A., Jamie, J. F. & Cooper, A. J. Lysine metabolism in mammalian brain: an update on the importance of recent discoveries. Amino Acids 45, 1249–1272 (2013).
Article CAS PubMed Google Scholar
Kovach, M. A. et al. Microarray analysis identifies IL-1 receptor type 2 as a novel candidate biomarker in patients with acute respiratory distress syndrome. Respir. Res. 16, 29 (2015).
Article ADS PubMed PubMed Central Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
Fakhro, K. A. et al. The Qatar genome: a population-specific tool for precision medicine in the Middle East. Hum. Genome Var. 3, 16016 (2016).
Article PubMed PubMed Central Google Scholar
Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011).
Article PubMed Google Scholar
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Article PubMed PubMed Central Google Scholar
Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 1, 457–470 (2011).
Article PubMed PubMed Central Google Scholar
Rhee, E. P. & Yang, Q. An exome array study of the plasma metabolome. Nat. Commun. 7, 12360 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).
Article CAS PubMed PubMed Central Google Scholar
Yousri, N. A. et al. Large scale metabolic profiling identifies novel steroids linked to rheumatoid arthritis. Sci. Rep. 7, 9137 (2017).
Article ADS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by Biomedical Research Program (BMRP) in Weill Cornell Medicine-Qatar, funded by Qatar Foundation. It was also supported by Qatar National Research Funds (QNRF) grants numbers: 09-740-3-192, 09-741-3-793 and in part by grant number 7-272-1-041. We acknowledge the efforts of Greg Schaaf (Metabolon) for synthesizing standards used in the identification of structurally unknown molecules described in this manuscript.

Author information

Noha A. Yousri and Khalid A. Fakhro contributed equally to this work.

Authors and Affiliations

Genetic Medicine, Weill Cornell Medicine-Qatar, PO Box 24144, Doha, Qatar
Noha A. Yousri, Khalid A. Fakhro, Amal Robay, Hassina Zeriri, Tala Odeh, Alya Al-Shakaki, Omar M. Chidiac & Joel A. Malek
Computer and Systems Engineering, Alexandria University, Alexandria, Egypt
Noha A. Yousri
Sidra Medical Research Center, Department of Human Genetics, PO Box 26999, Doha, Qatar
Khalid A. Fakhro
Genetic Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
Juan L. Rodriguez-Flores, Jason G. Mezey & Ronald G. Crystal
Metabolon Inc, Durham, NC, 27560, USA
Robert P. Mohney
Physiology and Biophysics, Weill Cornell Medicine-Qatar, PO Box 24144, Doha, Qatar
Sara Abdul Kader, Gaurav Thareja, Manish Kumar & Karsten Suhre
Genomics Core, Weill Cornell Medicine-Qatar, PO Box 24144, Doha, Qatar
Eman K. Aldous, Yasmin A. Mohamoud & Joel A. Malek

Authors

Noha A. Yousri
View author publications
You can also search for this author in PubMed Google Scholar
Khalid A. Fakhro
View author publications
You can also search for this author in PubMed Google Scholar
Amal Robay
View author publications
You can also search for this author in PubMed Google Scholar
Juan L. Rodriguez-Flores
View author publications
You can also search for this author in PubMed Google Scholar
Robert P. Mohney
View author publications
You can also search for this author in PubMed Google Scholar
Hassina Zeriri
View author publications
You can also search for this author in PubMed Google Scholar
Tala Odeh
View author publications
You can also search for this author in PubMed Google Scholar
Sara Abdul Kader
View author publications
You can also search for this author in PubMed Google Scholar
Eman K. Aldous
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Thareja
View author publications
You can also search for this author in PubMed Google Scholar
Manish Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Alya Al-Shakaki
View author publications
You can also search for this author in PubMed Google Scholar
Omar M. Chidiac
View author publications
You can also search for this author in PubMed Google Scholar
Yasmin A. Mohamoud
View author publications
You can also search for this author in PubMed Google Scholar
Jason G. Mezey
View author publications
You can also search for this author in PubMed Google Scholar
Joel A. Malek
View author publications
You can also search for this author in PubMed Google Scholar
Ronald G. Crystal
View author publications
You can also search for this author in PubMed Google Scholar
Karsten Suhre
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.A.Y., K.A.F., K.S. and R.G.C. conceived the study. N.A.Y. and K.A.F. led the study, interpreted genetic associations and wrote the paper. N.A.Y. performed all statistical and computational analysis, imputation, compilation and annotation of results, and interpretation of metabolic associations. K.A.F., J.L.R.-F., A.R., J.G.M. and R.G.C. provided all genotype and phenotype information for the Qatari subjects. R.P.M. provided Metabolon support for interpretation of unknown metabolites. A.A.-S. and O.M.C. performed sample collection and handling. S.A.K., T.O. and H.Z. aliquoted samples for Metabolon, and performed DNA extraction and processing in the genomic core. J.A.M., Y.A.M. and E.K.A. were responsible for the Genomic core for Illumina genotyping and NGS. G.T. provided pipeline tools for imputation and provided phased reference genome. M.K. provided plotting tools and aided with plotting results.

Corresponding authors

Correspondence to Noha A. Yousri, Khalid A. Fakhro or Karsten Suhre.

Ethics declarations

Competing interests

R.P.M. is an employee of Metabolon Inc. The remaining authors declare no competing financial interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yousri, N.A., Fakhro, K.A., Robay, A. et al. Whole-exome sequencing identifies common and rare variant metabolic QTLs in a Middle Eastern population. Nat Commun 9, 333 (2018). https://doi.org/10.1038/s41467-017-01972-9

Download citation

Received: 09 May 2017
Accepted: 30 October 2017
Published: 23 January 2018
DOI: https://doi.org/10.1038/s41467-017-01972-9

This article is cited by

Plasma metabolomics profiles in Black and White participants of the Adventist Health Study-2 cohort
- Fayth M. Butler
- Jason Utt
- Gary E. Fraser
BMC Medicine (2023)
Integrated epigenome, whole genome sequence and metabolome analyses identify novel multi-omics pathways in type 2 diabetes: a Middle Eastern study
- Noha A. Yousri
- Omar M. E. Albagha
- Steven C. Hunt
BMC Medicine (2023)
Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases
- Yiheng Chen
- Tianyuan Lu
- J. Brent Richards
Nature Genetics (2023)
Metabolomic epidemiology offers insights into disease aetiology
- Harriett Fuller
- Yiwen Zhu
- Burcu F. Darst
Nature Metabolism (2023)
Genetic studies of paired metabolomes reveal enzymatic and transport processes at the interface of plasma and urine
- Pascal Schlosser
- Nora Scherer
- Anna Köttgen
Nature Genetics (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Subject selection and genotyping

Pre-discovery exploration of replication in Caucasians

21 common variant loci influence metabolites and ratios

Fine-mapping identifies functional variant associations

Rare variants associated with metabolites in Qataris

Discussion

Methods

Study cohort

Exome sequencing

Imputation and filtering

Metabolomics data

Identification of unknown metabolites by Metabolon

mGWAS analysis

Loci and sentinel SNPs

Ratios analysis

Fisher exact test for randomness of missing values

Metabolite regression analysis and partial correlations

Rare variants analysis

URLs of databases used for annotation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links