Evidence-based tailoring of bioinformatics approaches to optimize methods that predict the effects of nonsynonymous amino acid substitutions in glucokinase

Šimčíková, Daniela; Kocková, Lucie; Vackářová, Kateřina; Těšínský, Miroslav; Heneberg, Petr

doi:10.1038/s41598-017-09810-0

Download PDF

Article
Open access
Published: 25 August 2017

Evidence-based tailoring of bioinformatics approaches to optimize methods that predict the effects of nonsynonymous amino acid substitutions in glucokinase

Daniela Šimčíková¹,
Lucie Kocková¹,
Kateřina Vackářová¹,
Miroslav Těšínský¹ &
…
Petr Heneberg ORCID: orcid.org/0000-0002-0703-951X¹

Scientific Reports volume 7, Article number: 9499 (2017) Cite this article

1247 Accesses
11 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Computational methods that allow predicting the effects of nonsynonymous substitutions are an integral part of exome studies. Here, we validated and improved their specificity by performing a comprehensive bioinformatics analysis combined with experimental and clinical data on a model of glucokinase (GCK): 8835 putative variations, including 515 disease-associated variations from 1596 families with diagnoses of monogenic diabetes (GCK-MODY) or persistent hyperinsulinemic hypoglycemia of infancy (PHHI), and 126 variations with available or newly reported (19 variations) data on enzyme kinetics. We also proved that high frequency of disease-associated variations found in patients is closely related to their evolutionary conservation. The default set prediction methods predicted correctly the effects of only a part of the GCK-MODY-associated variations and completely failed to predict the normoglycemic or PHHI-associated variations. Therefore, we calculated evidence-based thresholds that improved significantly the specificity of predictions (≤75%). The combined prediction analysis even allowed to distinguish activating from inactivating variations and identified a group of putatively highly pathogenic variations (EVmutation score <−7.5 and SNAP2 score >70), which were surprisingly underrepresented among MODY patients and thus under negative selection during molecular evolution. We suggested and validated the first robust evidence-based thresholds, which allow improved, highly specific predictions of disease-associated GCK variations.

An integrated multiomic approach as an excellent tool for the diagnosis of metabolic diseases: our first 3720 patients

Article Open access 25 May 2022

Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls

Article Open access 22 May 2019

Gene-level analysis of rare variants in 379,066 whole exome sequences identifies an association of GIGYF1 loss of function with type 2 diabetes

Article Open access 03 November 2021

Introduction

Glucokinase (GCK), which is one of the four mammalian isozymes that phosphorylate glucose, serves as a glucose sensor in pancreatic beta-cells, drives glucose conversion to glycogen in the liver and is expressed in the brain and endocrine cells of the gut¹. GCK is considered a key enzyme in the glycolytic pathway, particularly because of the concentration of substrate at which this enzyme shows half-maximal activity (S _0.5) that is within the physiological range of blood glucose concentrations² and because of its kinetic cooperativity, which is unique among hexokinases. In healthy people, this allows GCK to tune its response following the uptake of glucose-containing food without completely depleting blood glucose levels³.

The inactivating GCK nonsynonymous substitutions cause maturity-onset diabetes of the young (GCK-MODY) or insulin-deficient hyperglycemia when only one allele is affected, and they cause severe permanent neonatal diabetes mellitus (PNDM) when both alleles are inactivated, whereas the activating nonsynonymous substitutions lead to persistent hyperinsulinemic hypoglycemia of infancy (PHHI). Hundreds of nonsynonymous substitutions of the GCK gene have been described in many populations. The heterozygously manifested inactivating nonsynonymous substitutions are usually only associated with mild fasting hyperglycemia. Thus, many patients are not diagnosed because there is an absence of symptoms; a higher perceived prevalence of GCK-MODY is known in countries that perform routine blood glucose screens on pregnant women or oral glucose tolerance test (OGTT) on asymptomatic young relatives within families with multiple cases of type 2 diabetes mellitus⁴. Characterization of GCK nonsynonymous substitutions is a laborious and time-consuming task. Thus, it prevents large-scale analyses for clinical purposes, particularly when considering a structural perspective. Recently, computational evolution- and structure-based prediction analyses were suggested to estimate the effects of particular GCK nonsynonymous substitutions⁵. These analyses aimed to identify nonsynonymous substitutions, which are likely or unlikely to have a serious impact on the protein function and stability. However, these analyses have not been paired with robust experimental data. Currently, experimental data are available based on in vitro kinetic analyses of over a hundred GCK nonsynonymous substitutions e.g., refs 6,7,8,9, and some nonsynonymous substitutions have been newly re-classified as non-pathogenic¹⁰.

In this study, we aimed to provide the first robust evidence for choosing the best-fit method and the evidence-based threshold to predict the effects of GCK nonsynonymous substitutions. For the first time, we compared the outcomes of prediction methods with the outcomes of in vitro measurements reported previously or reported newly in the course of this study, and with clinical information known from patients carrying GCK nonsynonymous substitutions. We calculated the evidence-based thresholds in order to solve the problems with negligible specificity of their previously suggested arbitrary values. By the analysis of total hypothetical GCK nonsynonymous substitutions, we predicted the effects of GCK nonsynonymous substitutions for which the clinical or in vitro data are still absent.

Results

In vitro enzyme kinetics

We analyzed the enzyme kinetics of 16 naturally occurring GCK nonsynonymous substitutions known from MODY patients of Czech origin and the experimental nonsynonymous substitutions R63S, M251C and F260L. Five mutants – that included four naturally occurring GCK-MODY-associated mutants (G81D, M251I, G295D and G385W) and one experimental mutant (M251C) – displayed no or negligible activity at ≤150 mM glucose. Regarding the other mutants, the GCK-MODY-associated mutants M251V, L314P, F316V, G318R and F419L demonstrated a reduced affinity for glucose that was expressed as an elevated S _0.5 measured at 5 mM ATP (one-way ANOVA p < 0.001, F = 808.0, Tukey’s post-tests p < 0.05, excluding the enclosed means). However, when measured at the suboptimal ATP concentration (500 μM), only a partially overlapping set of GCK-MODY-associated mutants (F150L, M251V and A454E) exhibited an elevated S _0.5 (one-way ANOVA p < 0.001, F = 172.3, Tukey’s post-tests p < 0.05, excluding the enclosed means). There was no overlap between those differing significantly from WT-GCK at high and low levels of ATP (Table 1). The Hill coefficient differed among the tested mutants (one-way ANOVA p < 0.001, F = 4.506), but we did not find any significant differences between the mean Hill coefficient of WT-GCK and any of the mutants (all Tukey’s post-tests of WT vs. mutants felt into the category of enclosed means). The ATP K _M was high particularly for F150L (i.e., four-times the control value), and it was also significantly increased in A454 and V33A (one-way ANOVA p < 0.001, F = 212.3, Tukey’s post-tests p < 0.05, excluding the enclosed means). The tested mutants displayed a wide range of k _cat. Two of the MODY-associated mutants (V33A and F316V) exhibited k _cat values similar to WT-GCK, and all the others demonstrated decreased k _cat values compared to WT-GCK. Among the nonsynonymous substitutions tested, we did not find any that had a decreased stability at 30 °C (Table 1). Two nonsynonymous substitutions demonstrated a decreased susceptibility to the competitive inhibitor of the GCK activity, N-acetylglucosamine (GlcNAc). The majority of the nonsynonymous substitutions did not demonstrate any change in IC₅₀ of GlcNAc compared to WT-GCK, except for F150L (>6-fold higher IC₅₀) and A454E (2-fold higher IC₅₀) (Table 1).

Table 1 The kinetic data for WT-GCK, 16 naturally occurring GCK nonsynonymous substitutions associated with MODY patients and the experimental nonsynonymous substitutions R63S, M251C and F260L.

Full size table

Based on the measured data, we calculated the relative activity index (RAI) and the threshold for glucose-stimulated insulin release (GSIR-T) of each tested protein variation. While many demonstrated RAIs at or below 10% of the activity of WT-GCK, there were also MODY-associated protein variations that did not display any major difference in the RAI compared to WT-GCK. These included R250C (RAI 0.94), C434Y (RAI 0.71) and G318R (RAI 0.67). Near-normal levels of RAI resulted in mild changes of GSIR-T associated with these three nonsynonymous substitutions; the GSIR-T only ranged from 5.0 to 5.3 mM glucose. Changes in the enzyme kinetics of the other MODY-associated mutants led to changes in the RAI within the range from 5.7 to 7.1 mM glucose, which is characteristic for the GCK-MODY phenotype.

Within the variations tested, there were also three experimentally designed nonsynonymous substitutions. These included the newly identified activating mutant, R63S, which demonstrated the RAI of 4.17 and the GSIR-T of 2.9 mM and led to a marked decrease in both, S _0.5 and ATP K _M and an increased IC₅₀ of GlcNAc (Table 1). Another experimental mutant was F260L, which demonstrated a neutral phenotype, reaching the RAI of 0.95, the GSIR-T of 5.1 mM, a marginally decreased S _0.5 and ATP K _M and a similar k _cat and IC₅₀ of GlcNAc compared to WT-GCK. The last experimental mutant, M251C, exhibited a strong deactivating effect with barely detectable activity even at high doses of glucose, which led to an estimated RAI < 0.01 and GSIR-T ≥ 7.1 mM (Table 1).

Can prediction methods predict enzyme kinetics of GCK and are they in agreement with clinical data?

The prediction methods demonstrated a generally high sensitivity for deleterious MODY-associated nonsynonymous substitutions, and the sensitivity of all but one method reached at least 75%. However, many methods exhibited low sensitivity when attempting to predict the hypoglycemic phenotype, for which only PoPMuSiC 2.1 and PolyPhen-2 had sensitivity over 90%. However, high sensitivity of the latter two methods was associated with detrimental outcomes when predicting the neutral non-diabetic nonsynonymous substitutions because they reached a sensitivity of only 27 and 38%, respectively. The other prediction methods were more sensitive when predicting normoglycemic nonsynonymous substitutions and were associated with up to 94% sensitivity (SNPs&GO with GO terms excluded). However, when there was a high sensitivity for the nonsynonymous substitutions with neutral phenotypes, the false-positive ratio for detection of such nonsynonymous substitutions among those associated with disease phenotypes was increased. SNPs&GO with GO terms excluded had 28% false positive ratio for predicting nonsynonymous substitutions with the neutral phenotype. Given that the number of known normoglycemic nonsynonymous substitutions is lower by one order of magnitude compared with the disease-associated nonsynonymous substitutions of GCK, the actual number of false-positive hits suggested by SNPs&GO exceeded the number of true positive hits regardless of the inclusion of GO terms (Table 2). Previously, it was suggested to use the prediction methods in combination⁵. However, the combination of the state-of-the-art prediction methods did not lead to any improvement in the prediction of the physiologic effects of nonsynonymous substitutions when referring to the healthy vs. disease-associated status of their heterozygous carriers, particularly when one or more predictors were in disagreement (Fig. 1a).

Table 2 The summary of prediction scores for nonsynonymous substitutions in the GCK amino acid sequence, for which data were available either on their clinical phenotype, the GSIR-T or the Hill coefficient. For a detailed list of nonsynonymous substitutions analyzed, relevant raw data and references, cf. Table S2.

Full size table

The prediction methods demonstrated similar issues when tested against the GSIR-T values, which were calculated based on the experimentally measured enzyme kinetics data. Multiple methods had high sensitivities for both the activating and deactivating nonsynonymous substitutions, which were quantified as those with a GSIR-T lower than 4 mM or exceeding 5.5 mM, respectively. However, they failed to identify those within the normal range (4.0–5.5 mM glucose), which does not lead to PHHI, and is also below the threshold for diabetes. The most sensitive method for the neutral effects was SNPs&GO with GO terms excluded (55% sensitivity). However, the same method was associated with 29% false positive ratio for neutral predictions among nonsynonymous substitutions that cause disease-associated GSIR-T, and it was also associated with only 71% sensitivity for the deactivating nonsynonymous substitutions (Table 1). The combination of predictors was still associated with a poor ability to discriminate between a low and normal GSIR-T and with relatively low number of false-negative predictions among the nonsynonymous substitutions with a high GSIR-T (Fig. 1b).

One of the key features of GCK is its cooperativity. The majority of the tested prediction methods were able to correctly predict extreme changes in the Hill coefficient, which lead to a nearly complete loss of the cooperativity (n _H < 1.2) (Table 1). The combination of predictors was able to identify correctly those inducing complete or nearly complete loss of the cooperativity. However, the predictions were associated with a high number of false-positive results, and the only useful predictions of the absence of severe effects on the Hill coefficient were when seven or less algorithms agreed on the effect of the respective nonsynonymous substitution (Fig. 1c).

Reflecting the above unsatisfactory outcomes of the prediction methods, we analyzed whether the clinical phenotype of the patients can be predicted at all. First, we analyzed the relationship between the disease and experimentally measured enzyme kinetics of GCK by employing detrended correspondence analysis (DCA; Fig. 1d). The analysis involved six basic parameters for the enzyme kinetics, namely S _0.5, n _H, ATP K _M, k _cat, the RAI and the GSIR-T. The eigenvalues were 0.557 (axis 1), 0.111 (axis 2) and 0.026 (axis 3). At the molecular level, functional GCK variations are associated with two groups of phenotypes, the activating phenotypes (that manifest as PHHI) and the inactivating phenotypes (that manifest as GCK-MODY or PNDM). DCA distinguished between the MODY-associated and PHHI-associated nonsynonymous substitutions, and WT-GCK was positioned close to the left boundary of the MODY-associated area. The k _cat and RAI were responsible for most of the variability; fine resolution was allowed by the inclusion of S _0.5 and ATP K _M, which were associated with axis 2 (Fig. 1d). In contrast to the above comparison was the DCA of the nonsynonymous substitutions known from humans with the prediction methods that were suggested previously to predict the effects of GCK nonsynonymous substitutions^{5, 11} (Fig. 1e). The eigenvalues were 0.053 (axis 1), 0.014 (axis 2) and 0.013 (axis 3). The prediction methods with arbitrarily set thresholds did not correctly identify the effects of nonsynonymous substitutions as suggested by low eigenvalues and the overlap of convex hulls assigned to benign, activating and deactivating nonsynonymous substitutions (Fig. 1e). The number of predictors that correctly predicted the effect of each disease-associated nonsynonymous substitution, did not display any strong association with the clinical parameters measured in the affected patients, namely the levels of plasma glucose, glucose after 120 min OGTT, HbA_1c, C-peptide and age at diagnosis (Fig. S1).

In order to improve the predictions, we tested, whether there is a space for the evidence-based adjustments of the arbitrary thresholds of the prediction methods. We found that with evidence-based thresholds, three methods were capable of distinguishing between a group of disease-associated nonsynonymous substitutions and those with uncertain phenotypes. Namely, the EVmutation scores of normoglycemic nonsynonymous substitutions reached −2.39 (95% CI −3.30–−1.32; Fig. 1f). When setting the threshold values to a median minus 2-times SD (EVmutation = −6.31), 43% of MODY-associated nonsynonymous substitutions (but no PHHI-associated nonsynonymous substitutions) passed this threshold applied to the outcomes of the EVmutation. Similarly, PoPMuSiC 2.1 ΔΔG for neutral GCK nonsynonymous substitutions reached 0.58 kcal/mol (95% CI 0.35–0.78 kcal/mol; Fig. 1g). When setting the threshold values to a median plus 2-times SD (ΔΔG = 1.42), 42% of MODY-associated nonsynonymous substitutions (but only two PHHI-associated nonsynonymous substitution) passed this threshold applied to the outcomes of PoPMuSiC 2.1. A similarly calculated threshold for SNAP2 was 6.5 (mean −58, 95% CI −77.06–−39.06), which allowed to classify 75% of the tested MODY-associated (and four PHHI-associated) nonsynonymous substitutions as disease-associated (Fig. 1h). In contrast, SIFT, PolyPhen-2, I-Mutant 3 and AlignGVGD [including Grantham variation (GV) or Grantham deviation (GD) alone] did not allow any improvement in the resolution based on increases in threshold values. Other tested methods, including PhD-SNP and SNPs&GO did not allow this adjustment because they generate only binary outcomes.

We employed the evidence-based thresholds in the estimation of the theoretical frequency of putative MODY-associated variations among total hypothetic GCK nonsynonymous substitutions (Table 3). The distribution of resulting scores of EVmutation (Fig. 1i) and PoPMuSiC 2.1 (Fig. 1j) overlapped for total putative nonsynonymous substitutions and MODY-associated nonsynonymous substitutions, with slightly less MODY-associated nonsynonymous substitutions being associated with low EVmutation scores (Fig. 1i). In contrast, the total and MODY-associated SNAP2 scores did not have the same distribution, and more total nonsynonymous substitutions were associated with high SNAP2 scores (Fig. 1k). The three scores were only incompletely correlated. We found the strongest correlation between the SNAP2 and EVmutation scores (Pearson −0.778, p << 0.001; Spearman 0.789, p << 0.001, n = 8,189 nonsynonymous substitutions), followed by PoPMuSiC 2.1 and EVmutation (Pearson −0.406, p << 0.001; Spearman 0.383, p << 0.001, n = 8,493 nonsynonymous substitutions) and SNAP2 and PoPMuSiC 2.1 (Pearson −0.362, p << 0.001; Spearman 0.339, p << 0.001, n = 8,189 nonsynonymous substitutions). When we combined the three prediction methods, the ternary transformed data allowed to distinguish the nonsynonymous substitutions associated with PHHI or normoglycemia (Fig. 2a) from those associated with MODY (Fig. 2b). The nonsynonymous substitutions with unknown clinical phenotype (so far not observed in humans) followed the same distribution pattern; most of them accumulated in the region of high SNAP2 score and low EVmutation score (Fig. 2c). Raw EVmutation and SNAP2 scores were able to differentiate nonsynonymous substitutions associated with PHHI or normoglycemia (Fig. 2d) from those associated with MODY (Fig. 2f). The distribution pattern of nonsynonymous substitutions, which were so far not observed in humans, suggests the existence of two dominant phenotypes, one considered benign (EVmutation score from −2 to −4 and SNAP2 score <−50) and the other one predicted to be highly pathogenic and surprisingly underrepresented even among MODY patients (EVmutation score <−7.5 and SNAP2 score >70) (Fig. 2f).

Table 3 The estimations of the effects of total hypothetic GCK nonsynonymous substitutions.

Full size table

Comparison between the evolutionary conservation and frequency of nonsynonymous substitutions

We found a total of 301 invariant residues (65%) out of the total 465 residues of GCK when comparing GCK protein sequences from 12 vertebrate species (Fig. S2). The large and fully evolutionarily conserved regions particularly included the residue positions: 52–66, 77–93, 140–158, 160–180, 182–217, 225–238, 248–261, 404–417 and 441–451. All known glucose binding (T168, K169, N204, D205, E256 and E290), ATP binding (T228, T332, S336, V412 and L415) and allosteric (R63, Y215, M210, Y214, V452 and V455) sites of GCK^{12, 13} were fully evolutionarily conserved except for T332 (GV = 57.75) and V452 (GV = 29.61), which also displayed a high conservation (defined as GV < 61.3) (Table S1).

When plotted against the frequency of 1596 disease-associated families with nonsynonymous substitutions in GCK, all but one of the sites mutated in 15 or more families with MODY or PHHI (n = 23 amino acids) were considered as highly conserved, with zero GV score (Fig. 2g,h). The only exception was S383, which was repeatedly reported to be mutated to leucine or, less frequently, threonine in multiple European and Canadian families⁴ (Table S1). There is an overall correlation between positions that are mutated at a high frequency and sites that have a low GV (Pearson −0.182, p << 0.001; Spearman −0.394, p << 0.001; Fig. 2g,h). This correlation thus indicates that the nonsynonymous substitutions at highly conserved positions are strongly contributing to the disease manifestation, whereas the others may escape attention because they may not be associated with any phenotypes. Not all conserved residues were frequently mutated; for example, there were no nonsynonymous substitutions associated with the evolutionarily conserved residues 83–90. However, the extremely low frequency of disease-associated nonsynonymous substitutions was in exon 1 (any of its three forms), which corresponds to high GV values associated with the N-terminus of GCK (Fig. 2g,h).

Discussion

The use of prediction algorithms seems to be necessary part of science and diagnostics, especially in time of next-generation sequencing and other omics studies, for which the complete functional analyses of protein variations found are unfeasible and ineffective. Widely used databases list the outcomes of some of these algorithms; for example, the Ensembl genome browser provides the predictions generated by the SIFT and PolyPhen algorithms for each of the nonsynonymous substitutions listed¹⁴. This is convenient because the use of these algorithms minimizes the amount of nonsynonymous substitutions studied only to those predicted to be deleterious and thus potentially causing the phenotype in the respective study subject^{15, 16}. The two predictors implemented in the Ensembl genome browser are also widely used in studies focusing on particular proteins, including those that focus on the activity of GCK. Some of these studies generated data, which are in agreement with the two predictors¹⁰, but accumulating evidence suggests that they may exhibit surprisingly high false positive rates (e.g., 29% and 43%, respectively, as reported by Romeo et al.¹⁷) and surprisingly low rates of correct predictions (53 and 63%, respectively¹⁷). Therefore, their outcomes may be over-interpreted when used without matching the data with the measurements of enzyme kinetics and clinical data. The field of prediction methods is developing quickly, with the EVmutation method being the latest important contribution to the field¹¹. The EVmutation method reflects the epistasis by explicitly modeling interactions between all the pairs of residues in proteins, and was claimed to outperform dramatically the nowadays broadly used SIFT and PolyPhen methods¹¹. However, here we have shown that EVmutation is associated with similar issues of poor sensitivity for activating and neutral nonsynonymous substitutions as are the previously developed models, despite the sensitivity of EVmutation for deactivating nonsynonymous substitutions was similar as in other prediction methods with the best performance (Table 2). The use of evidence-based thresholds is necessary in order to avoid low selectivity (Figs 1–2). The evidence-based adjustment of the thresholds allowed confident identification of up to three quarters of MODY-associated nonsynonymous substitutions, but all the methods failed to identify selectively the nonsynonymous substitutions associated with normoglycemia or hypoglycemia. The latter two groups largely overlapped in the outcomes of all the tested prediction methods and they were also interspersed with a minority of MODY-associated variations (which, however, dominate the datasets, and thus blur any analyses of nonsynonymous substitutions associated with normoglycemia or hypoglycemia; Table 2, Figs 1–3).

When focusing on MODY-associated genes, the most authoritative work was published by Flanagan et al.¹⁸, who tested 66 gain-of-function and 67 loss-of-function nonsynonymous substitutions in GCK, ABCC8 and KCNJ11. They concluded that the sensitivity of SIFT and PolyPhen reached 69% and 68%, but the specificity was only 13% and 16%, respectively, with both predictors predicting more precisely the loss-of-function nonsynonymous substitutions. In another study, Rees et al.¹⁹ found that SIFT and PolyPhen failed to correctly predict three out of 15 nonsynonymous substitutions in GKRP. False predictions of the benign phenotype in GKRP by PolyPhen were also noticed by Johansen et al.²⁰. When focusing on GCK, most of the previous studies, which focused on MODY-associated nonsynonymous substitutions, noticed overall concordance between disease-associated phenotypes and predictions of deleterious effects based on SIFT and PolyPhen^{10, 21,22,23,24}. This conclusion is in agreement with the present study, which found that these two methods tend to correctly identify MODY-associated nonsynonymous substitutions but also identify a large part of neutral nonsynonymous substitutions as deleterious and fail to correctly distinguish hypoglycemia-associated nonsynonymous substitutions (Table 2, Fig. 3).

In addition to the prediction analysis, in the present study, we provided basic kinetic characterization of 19 nonsynonymous substitutions in GCK. In agreement with previous studies^{25, 26}, the dataset of MODY-associated nonsynonymous substitutions included some of the nonsynonymous substitutions, which paradoxically had near-normal kinetics. These included R250C with a GSIR-T of 5.0 mM and C434Y with a GSIR-T of 5.1 mM (Table 1). The C434Y affects one of the experimentally confirmed nitrosylation sites within the GCK molecule²⁷. Although the function of C434 nitrosylation is unknown (in contrast with the modification of C371), the association of this nonsynonymous substitution with four independent Czech families²⁸ clearly suggests its role in MODY onset and progression. Additionally, R250C is associated with a strong phenotype with manifestation during childhood and with confirmed family history²⁹, and it is known in MODY patients of Serbian and Czech origin^{29, 30}. All prediction algorithms suggest its deleterious effect (Table S2). Thus, our data illustrated that the MODY-associated nonsynonymous substitutions employ various mechanisms of action. Of particular interest is the decreased sensitivity of F150L to the regulation via GlcNAc. This glucose derivative is known to inhibit glucokinase competitively in a cooperative manner³¹. It is a natural part of biopolymers on the surface of many pathogens and natural compounds in our food. Another hexokinase isoform was already confirmed to serve as a sensor for the detection of bacterial GlcNAc³²; thus, glucokinase can also serve as a pattern recognition receptor, and pathogen- and food-derived GlcNAc may differentially affect GCK action in health and disease. Such regulation would be impaired in GCK-MODY patients heterozygous for F150L. This speculation requires further verification in the near future.

In conclusion, this study provided the first robust evidence for choosing the best-fit method and the evidence-based threshold to predict the effects of GCK nonsynonymous substitutions for which in vitro data are still absent. Even with the newly proposed evidence-based thresholds, the precision of the available methods allowed predicting correctly up to 75% of true MODY-associated variations, leaving the remaining quarter of true MODY-associated nonsynonymous substitutions in the grey zone of uncertain predictions. The combined computational analysis of total hypothetical GCK nonsynonymous substitutions identified a group of putatively highly pathogenic variations (EVmutation score <−7.5 and SNAP2 score >70), which were surprisingly underrepresented among MODY patients. We speculate that a negative selection may play a role in the low frequency of variations predicted to be highly pathogenic despite the heterozygous manifestation of inactivating GCK variations is generally associated with only a relatively mild disease phenotype.

We reported here a novel approach how to interpret the outcomes of prediction methods. The tailor-made thresholds based on non-phenotypic variations can be used to improve predictions that reflect genuine effects. On a model of GCK, the combination of EVmutation and SNAP2 with implemented evidence-based thresholds seems to be readily applicable and feasible for the analyses of any newly found GCK variations. Further research should elucidate, whether the same approach can be generalized and applied across the proteome. Evidence-based cross-correlations of new bioinformatics methods with available experimental and clinical data represent the most promising approach that allows analyzing newly found amino acid sequence variations without time-consuming experiments, especially when multiple kinds of measurements would be needed in order to characterize the effect of a particular amino acid sequence variation.

Methods

We prepared a series of GCK constructs (Table S3) that carry previously published^{24, 28, 30, 33,34,35} nonsynonymous substitutions that are associated with the GCK-MODY phenotype in patients of Czech origin. Additionally, we prepared new constructs carrying nonsynonymous substitutions R63S, M251C and F260L for experimental purposes only. The constructs consisted of the wild-type (WT)-GCK isoform 1 in pGEX-5X-2 (kind gift from Dr. Navas³⁶), to which we introduced the particular nonsynonymous substitutions via site-directed mutagenesis. We measured the glucokinase activity spectrophotometrically using a coupled reaction with glucose-6-phosphate dehydrogenase, and we identified the increasing concentrations of NADPH at 340 nm as described³⁷. We computed the kinetic parameters, including the Hill coefficient n _H, the turnover number k _cat, S _0.5, ATP K _M, the protein stability, RAI and GSIR-T, as described^37,38,39. For each construct, we also performed a competitive inhibition assay using GlcNAc and calculated its IC₅₀ at 5 mM glucose and 5 mM ATP.

We employed nine prediction methods (for a detailed overview, cf. Suppl. Methods.) for the prediction of phenotypic effects of GCK nonsynonymous substitutions. These included methods that use evolution-based sequence information (SIFT, PhD-SNP), as well as those that take into account the chemical and physical characteristics of amino acids (Align-GVGD) or protein structural attributes combined with multiple sequence alignment-derived information (EVmutation, PolyPhen-2, SNAP2 and SNPs&GO). A single amino acid nonsynonymous substitution can result in notable change in the protein stability, which is represented by a change in its Gibbs free energy (∆∆G) upon folding. Therefore, we also employed two predictors that focused on the stability properties of nonsynonymous substitutions, I-Mutant 3.0 and PoPMuSiC 2.1. We performed these predictions for an up-to-date set of published GCK-MODY- and PHHI-associated nonsynonymous substitutions and for those, which are not associated with any monogenically inherited disease effects. In addition to referring to relevant publications, we retrieved data on the GCK nonsynonymous substitutions from the Ensembl, dsSNP, UniProtKB and HGMD databases. We matched the predictions with the previously published and newly generated experimental data on enzyme kinetics and with clinical data available from previously published studies on GCK-MODY, PHHI and PNDM. Data are shown as the means ± SE, unless stated otherwise.

Data availability

A detailed overview of the nonsynonymous substitutions analyzed, including the outcomes of the prediction methods, previously published and newly generated experimental data on enzyme kinetics, available clinical data and relevant references are all listed in Table S2; detailed protocols are provided in Suppl. Methods.

References

Jetton, T. L. et al. Analysis of upstream glucokinase promoter activity in transgenic mice and identification of glucokinase in rare neuroendocrine cells in the brain and gut. J. Biol. Chem. 269, 3641–3654 (1994).
CAS PubMed Google Scholar
Lenzen, S. A fresh view of glycolysis and glucokinase regulation: history and current status. J. Biol. Chem. 289, 12189–12194 (2014).
Article CAS PubMed PubMed Central Google Scholar
Larion, M. et al. Kinetic cooperativity in human pancreatic glucokinase originates from millisecond dynamics of the small domain. Angew. Chem. Int. Ed. 127, 8247–8250 (2015).
Article Google Scholar
Osbak, K. K. et al. Update on mutations in glucokinase (GCK), which cause maturity‐onset diabetes of the young, permanent neonatal diabetes, and hyperinsulinemic hypoglycemia. Hum. Mutat. 30, 1512–1526 (2009).
Article CAS PubMed Google Scholar
George, D. C. et al. Evolution-and structure-based computational strategy reveals the impact of deleterious missense mutations on MODY 2 (maturity-onset diabetes of the young, type 2). Theranostics 4, 366–385 (2014).
Article PubMed PubMed Central Google Scholar
Glaser, B. et al. Familiar hyperinsulinism caused by an activating glucokinase mutation. N. Engl. J. Med. 338, 226–230 (1998).
Article CAS PubMed Google Scholar
Massa, O. et al. High prevalence of glucokinase mutations in Italian children with MODY. Influence on glucose tolerance, first-phase insulin response, insulin sensitivity and BMI. Diabetologia 44, 898–905 (2001).
Article CAS PubMed Google Scholar
Gloyn, A. L. et al. Prevalence of GCK mutations in individuals screened for fasting hyperglycaemia. Diabetologia 52, 172–174 (2009).
Article CAS PubMed Google Scholar
García-Herrero, C.-M. et al. Functional characterization of MODY2 mutations highlights the importance of the fine-tuning of glucokinase and its role in glucose sensing. PLoS ONE 7, e30518 (2012).
Article ADS PubMed PubMed Central Google Scholar
Steele, A. M. et al. The previously reported T342P GCK missense variant is not a pathogenic mutation causing MODY. Diabetologia 54, 2202–2205 (2011).
Article CAS PubMed Google Scholar
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kamata, K., Mitsuya, M., Nishimura, T., Eiki, J. & Nagata, Y. Structural basis for allosteric regulation of the monomeric allosteric enzyme human glucokinase. Structure 12, 429–438 (2004).
Article CAS PubMed Google Scholar
Molnes, J. et al. Binding of ATP at the active site of human pancreatic glucokinase-nucleotide-induced conformational changes with possible implications for its kinetic cooperativity. FEBS J. 278, 2372–2386 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ensembl genome browser 88. Available from http://www.ensembl.org/ (2017).
Flannick, J. et al. Assessing the phenotypic effects in the general population of rare variants in genes for a dominant Mendelian form of diabetes. Nat. Genet. 45, 1380–1385 (2013).
Article CAS PubMed PubMed Central Google Scholar
Dupuis, J. et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat. Genet. 42, 105–116 (2010).
Article CAS PubMed PubMed Central Google Scholar
Romeo, S. et al. Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J. Clin. Invest. 119, 70–79 (2009).
CAS PubMed Google Scholar
Flanagan, S. E., Patch, A. M. & Ellard, S. Using SIFT and PolyPhen to predict loss-of-function and gain-of-function mutations. Genet. Test. Mol.Biomarkers 14, 533–537 (2010).
Article CAS PubMed Google Scholar
Rees, M. G. et al. Correlation of rare coding variants in the gene encoding human glucokinase regulatory protein with phenotypic, cellular, and kinetic outcomes. J. Clin. Invest. 122, 205–217 (2012).
Article CAS PubMed Google Scholar
Johansen, C. T., Wang, J. & Lanktree, M. B. Excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia. Nat. Genet. 42, 684–687 (2010).
Article CAS PubMed PubMed Central Google Scholar
Beer, N. L. et al. Insights into the pathogenicity of rare missense GCK variants from the identification and functional characterization of compound heterozygous and double mutations inherited in cis. Diabetes Care 35, 1482–1484 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kanthimathi, S. et al. Glucokinase gene mutations (MODY 2) in Asian Indians. Diabetes Technol. Therap. 16, 180–185 (2014).
Article CAS Google Scholar
Estalella, I. et al. Mutations in GCK and HNF-1α explain the majority of cases with clinical diagnosis of MODY in Spain. Clin. Endocrinol. 67, 538–546 (2007).
CAS Google Scholar
Valentínová, L. et al. Identification and functional characterization of novel glucokinase mutations causing maturity-onset diabetes of the young in Slovakia. PLoS ONE 7, e34541 (2012).
Article ADS PubMed PubMed Central Google Scholar
Sagen, J. V. et al. From clinicogenetic studies of maturity-onset diabetes of the young to unraveling complex mechanisms of glucokinase regulation. Diabetes 55, 1713–1722 (2006).
Article CAS PubMed Google Scholar
Gloyn, A. L. et al. Insights into the structure and regulation of glucokinase from a novel mutation (V62M), which causes maturity-onset diabetes of the young. J. Biol. Chem. 280, 14105–14113 (2005).
Article CAS PubMed Google Scholar
Rizzo, M. A. & Piston, D. W. Regulation of β cell glucokinase by S-nitrosylation and association with nitric oxide synthase. J. Cell Biol. 161, 243–248 (2003).
Article CAS PubMed PubMed Central Google Scholar
Pruhova, S. et al. Glucokinase diabetes in 103 families from a country-based study in the Czech Republic: geographically restricted distribution of two prevalent GCK mutations. Pediatr. Diabetes 11, 529–535 (2010).
Article CAS PubMed Google Scholar
Milenković, T., Zdravković, D. & Mitrović, K. [Novel glucokinase mutation in a boy with maturity-onset diabetes of the young]. Srp. Arh. Celok. Lek. 136, 542–544 (2008).
Article PubMed Google Scholar
Pinterova, D. et al. Six novel mutations in the GCK gene in MODY patients. Clin. Genet. 71, 95–96 (2007).
Article CAS PubMed Google Scholar
Cárdenas, M. L., Rabajille, E. & Niemeyer, H. Suppression of kinetic cooperativity of hexokinase D (glucokinase) by competitive inhibitors. A slow transition model. Eur. J. Biochem. 145, 163–171 (1984).
Article PubMed Google Scholar
Wolf, A. J. et al. Hexokinase is an innate immune receptor for the detection of bacterial peptidoglycan. Cell 166, 624–636 (2016).
Article CAS PubMed PubMed Central Google Scholar
Lukášová, P. et al. Screening of mutations and polymorphisms in the glucokinase gene in Czech diabetic and healthy control populations. Physiol. Res. 57, S99–S108 (2008).
PubMed Google Scholar
Pruhova, S. et al. Genetic epidemiology of MODY in the Czech republic: new mutations in the MODY genes HNF-4α, GCK and HNF-1α. Diabetologia 46, 291–295 (2003).
Article CAS PubMed Google Scholar
Urbanová, J. et al. Positivity for islet cell autoantibodies in patients with monogenic diabetes is associated with later diabetes onset and higher HbA_1c level. Diabet. Med. 31, 466–471 (2014).
Article PubMed Google Scholar
García-Herrero, C. M. et al. Functional analysis of human glucokinase gene mutations causing MODY2: exploring the regulatory mechanisms of glucokinase activity. Diabetologia 50, 325–333 (2007).
Article PubMed Google Scholar
Davis, E. A. et al. Mutants of glucokinase cause hypoglycaemia- and hyperglycaemia syndromes and their analysis illuminates fundamental quantitative concepts of glucose homeostasis. Diabetologia 42, 1175–1186 (1999).
Article CAS PubMed Google Scholar
Matschinsky, F. M. Assessing the potential of glucokinase activators in diabetes therapy. Nat. Rev. Drug Discov. 8, 399–416 (2009).
Article CAS PubMed Google Scholar
Matschinsky, F. M. et al. The glucokinase system and the regulation of blood sugar. In Matschinsky, D. M. & Magnuson, M. A., Eds Molecular pathogenesis of MODYs. Basel, Karger, pp. 99–108 (2000).

Download references

Acknowledgements

We thank Maria Angeles Navas (Complutense University of Madrid) for the WT-GCK construct, Michal Anděl, Blanka Rypáčková, Jitka Tomešová and Jana Urbanová for sharing the data on the newly identified GCK-MODY patients and Michal Boušek for expert technical assistance. The study was supported by the Czech Science Foundation project 15-03834Y and Charles University in Prague projects Primus/MED/32 and 260387/SVV/2017. All financial support for the work was acknowledged.

Author information

Authors and Affiliations

Charles University, Third Faculty of Medicine, Prague, Czech Republic
Daniela Šimčíková, Lucie Kocková, Kateřina Vackářová, Miroslav Těšínský & Petr Heneberg

Authors

Daniela Šimčíková
View author publications
You can also search for this author in PubMed Google Scholar
Lucie Kocková
View author publications
You can also search for this author in PubMed Google Scholar
Kateřina Vackářová
View author publications
You can also search for this author in PubMed Google Scholar
Miroslav Těšínský
View author publications
You can also search for this author in PubMed Google Scholar
Petr Heneberg
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.H. and D.S. conceived and designed the experiments, analyzed the data, wrote the paper and are responsible for the integrity of this work. D.S., L.K., K.V. and M.T. acquired data. All authors revised the article’s intellectual content and approved the final version.

Corresponding author

Correspondence to Petr Heneberg.

Ethics declarations

Competing Interests

The presentation of work-in-progress data was supported by Eli Lilly. The authors declare that they have no other conflict of interest.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Šimčíková, D., Kocková, L., Vackářová, K. et al. Evidence-based tailoring of bioinformatics approaches to optimize methods that predict the effects of nonsynonymous amino acid substitutions in glucokinase. Sci Rep 7, 9499 (2017). https://doi.org/10.1038/s41598-017-09810-0

Download citation

Received: 09 June 2017
Accepted: 28 July 2017
Published: 25 August 2017
DOI: https://doi.org/10.1038/s41598-017-09810-0

This article is cited by

Identification of alkaline pH optimum of human glucokinase because of ATP-mediated bias correction in outcomes of enzyme assays
- Daniela Šimčíková
- Petr Heneberg
Scientific Reports (2019)
Refinement of evolutionary medicine predictions based on clinical evidence for the manifestations of Mendelian diseases
- Daniela Šimčíková
- Petr Heneberg
Scientific Reports (2019)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.