Introduction

Obesity in adults and children is one of the largest worldwide health-care problems. A recent comprehensive analysis of overweight and obesity in 195 countries between 1990 and 2015 has revealed sobering trends in that the rate of increase of obesity in children has been greater than the rate of increase in adults.1 Obesity is defined as the accumulation of excess body fat, in addition to the accumulation of adipose tissue in the liver, skeletal muscle and pericardial region, all of which can lead to the development of obesity’s complications.2 Insulin resistance (IR) occurs early and is possibly the main mechanism leading to the metabolic complications of obesity, such as type 2 diabetes mellitus (T2D), dyslipidaemia, early atherosclerosis and cardiovascular disease.3

Environmental and genetic factors contribute to IR. The main environmental factors are obesity, sedentary behaviour, stress, nutritional factors (such as excessive intake of fructose in liquid form and branched-chain amino acids) and sleep deprivation.4 IR is associated with certain well-known human monogenic disorders.5 The first single gene mutation responsible for severe IR was discovered in the insulin receptor gene (INSR) in 1988.6, 7 A comprehensive review of monogenic forms of IR is given by Semple et al.,8 and we will list here only a few monogenic genes causing severe IR. Directly functionally connected to INSR are mutations in HMGA1, a transcription factor binding to the INSR promotor. Downstream of insulin signalling, mutations were identified in AKT2 and AS160. Lipodystrophy disorders have also resulted in severe IR such as certain alleles in CAV1, PTRF, CIDEC and BSCL2. Digenic mutations in PPARG and PPP1R3A have revealed that a combination of mutations in genes of lipid or carbohydrate metabolism can result in IR. Moreover, several complex syndromes primarily due to severe obesity and hyperphagia have also been associated with severe IR (see also Semple et al.8). In addition, several genetic variants have been associated with IR in humans using candidate gene and genome-wide association study (GWAS) approaches.5, 9 There is a higher incidence of IR with the simultaneous presence of obesity and associated genetic variants.10 This implies that genetic predisposition to IR does not necessarily lead to IR, but IR develops as the genetic predisposition interacts with environmental factors, especially excessive body weight. However, it could also be that excessive body weight and IR are both consequences of the same (environmental) exposure.

Several genetic loci have been associated with T2D and are collated in the GWAS catalogue (https://www.ebi.ac.uk/gwas). This catalogue is manually curated in a quality-controlled manner and collects data from the literature of published GWAS assaying at least 100 000 single nucleotide polymorphisms (SNPs) and all SNP–trait associations with P-values<1.0 × 10−5.11 As of 5 October 2017, 179 significant GWAS loci in 21 studies for a term insulin resistance were recorded, but essentially all of these were derived from studies of adult populations. Only three GWAS studies reported associations also in children: an LEPR gene variant,12 a common variant in the FTO gene in a European population,13 and several genetic loci in Hispanic children for T2D and IR.14

The above-mentioned studies, therefore, refer to the identification of genes associated with IR in fully developed T2D, mostly in adults. Much less is known about the genetic factors important in the early phases of T2D and in the relationship between obesity and IR. One large-scale meta-analysis study in adults identified six previously unknown loci associated with IR, implying that IR loci can function in different ways than T2D genes.15 There are no similar studies in children and adolescents specifically designed to identify IR loci. However, the authors of one study demonstrated the cumulative role of several genetic variants linked to T2D (in genes TCF7L2, HHEX, SLC30A8, WFS1, KCNJ11, KCNQ1, MTNR1B, FTO and PPARG) in the development of prediabetes in adolescents. A greater number of variants was associated with a higher likelihood of prediabetes in this population.16 In another study, the authors linked genetic variants in genes TCF7L2, IGF2BP2, CDKAL1 and HHEX1A with oral glucose tolerance test results in adolescents, identifying reduced insulin secretion as an important pathophysiologic factor.17 In the study of Xi et al.,18 two SNPs in or near the GNPDA2 and KCTD15 genes were significantly associated with the risk of IR in Chinese children. In a cross-sectional cohort of Greek children and adolescents of European descent, a significant association was detected between CDKAL1 SNPs and IR. Recently, a candidate gene approach study demonstrated that the Pro12Ala polymorphism in PPARG was associated with IR in Mexican children and suggested that this relationship was modified by dyslipidaemia.19 Therefore, genetic studies of T2D in adults and epidemiological, as well as non-genetic (such as nutritional and physical activity), studies of IR and T2D in children are numerous, whereas genetic studies specifically aiming to identify novel genes predisposing to IR in children and adolescents are lacking.

GWAS is an approach that allows examination of common genetic variants throughout the genome. Several studies have demonstrated that DNA pooling can detect the most promising candidate SNPs or genes, with considerable savings in time and costs.20, 21, 22, 23, 24 Based on a comprehensive review, theoretical calculations and experimental validations comparing classical GWAS studies using individual genotyping to GWAS with DNA pooling suggested that pooling-based GWAS is a much more efficient strategy for identifying genetic associations with diseases or traits.25 Perhaps the most important advantage of pooled-DNA GWAS is its efficiency of study design, which requires three orders of magnitude less financial input than GWAS strategies based on individual genotyping in the first phase. In addition, pooled-DNA GWAS can be effectively applied to studies involving smaller populations. Specifically, in rare diseases or in smaller populations such as the Slovenian paediatric population, it is difficult to obtain an appropriately large sample for GWAS using individual genotyping. In the present study, GWAS was performed using DNA pools from cases and controls of obese children and adolescents with and without IR.

Materials and methods

Patients

Prior to inclusion in the study, all participants or their legal guardians signed an informed consent. The study protocol was approved by The Slovenian National Medical Ethics Committee (no. 25/10/09). The study included the first cohort of 198 obese children and adolescents managed by the Department of Endocrinology, Diabetes and Metabolism, University Children’s Hospital, Ljubljana, Slovenia for obesity. Characteristics of the subjects are shown in detail in Table 1. For the purposes of the replication of molecular genetic analysis of identified SNPs in an independent population, the second cohort of additional 157 obese children and adolescents, managed by the same department, matched for age, gender status, degree of overweight and IR status to the primary cohort was included. Cohort was divided into an IR+ group (39 boys, 40 girls; mean age=13.9±2.6 years, mean standardized body mass index (BMI-SDS)=2.9±0.5) and IR− group (38 boys, 40 girls; mean age=13.8±2.9 years, mean BMI-SDS=2.8±0.4) using criteria identical to that used for the pooled-DNA GWAS cohort.

Table 1 Cohorts of obese adolescents with (IR+) and without (IR−) IR

Pubertal status of the subjects was determined in both cohorts according to Tanner26, 27; it was, however, not used as a matching criterion. Characteristics of the subjects are also shown in detail in Table 1. The two cohorts were comparable regarding the pubertal status ratios. In the original first cohort, there were 17/42 (40%) prepubertal, 23/56 (41%) midpubertal and 59/101 (58%) postpubertal subjects in IR+ group. In the IR− group, there were 25/42 (60%) prepubertal, 33/56 (59) midpubertal and 42/101 (42%) postpubertal subjects. In the replication (second) cohort, there were 8/19 (42%) prepubertal, 30/65 (46%) midpubertal and 39/73 (53%) postpubertal subjects in IR+ group. In the IR− group, there were 11/19 (58%) prepubertal, 35/65 (54%) midpubertal and 34/73 (47%) postpubertal subjects.

Height (cm) and weight (kg) of participants were measured by trained medical staff using validated wall-mounted stadiometers (Quick Medical, Issaquah, WA, USA) and electronic digital scales (Alba, Vojnik, Slovenia). BMI values were calculated as weight (kg)/height2 (m2). Obesity was defined as a BMI-SDS>2.28

IR assessment

Oral glucose tolerance test was performed in all subjects after an overnight fast. Blood samples were taken before (0 min) and 30, 60 and 120 min after glucose ingestion (1.75 g kg−1; maximum, 75 g).29 The concentration of glucose was measured using a routine hexokinase-based protocol and an Olympus AU400 Chemistry Analyser (Olympus, Tokyo, Japan). The plasma insulin concentration was assessed with the Immulite 2000 Insulin solid-phase enzyme-labelled chemiluminescent immunometric assay using an Immulite 2000 analyser (Siemens, Berlin, Germany).

IR was determined using the whole-body insulin sensitivity index (WBISI) and homeostatic model assessment—IR (HOMA-IR). WBISI values were calculated using this formula:

with glucose concentrations in mmol l−1 and insulin concentration in pmol l−1.30 HOMA-IR values were calculated using this formula: , with the glucose and insulin concentrations in the same units as for WBISI.31 The WBISI threshold was set at <3 and the HOMA-IR threshold was set at >2.5.

Genome-wide association study

GWAS was performed on pooled samples of the first cohort of obese children and adolescents to search for genetic variants. Obese adolescents matched for gender, age and BMI-SDS were divided into two groups based on IR status (Table 1). The group of obese children and adolescents with IR (IR+) included 48 boys and 50 girls (mean age=13.5±2.6 years, mean BMI-SDS=3.0±0.5) and the group of obese children and adolescents without IR (IR−) included 50 boys and 50 girls (mean age=12.6±2.9 years, mean BMI-SDS=2.8±0.5). Genomic DNA was isolated from whole blood samples using the FlexiGene DNA Isolation Kit (Qiagen, Hilden, Germany). Before pooling, genomic DNA was diluted to 100 ng l−1 to ensure equal representation of each sample. GWAS was conducted by the Beijing Genomics Institute (BGI, Hong Kong, Hong Kong) using SNP-chip Ilumina HumanOmni5-Quad v1.0 (Illumina Inc., San Diego, CA, USA). Each IR+ or IR− pool was assessed three times, generating three replicates per each IR group.

Data normalization and quality control

To compute the allele frequency estimates from the pooled data, the raw two colour (green/red) bead scores were extracted from the HumanOmni5-Quad (v1.0) array scans. The raw intensity scores, as reported by the Illumina GenomeStudio software (v1.9.4), require calibration before further processing to correct for manufacturing and/or assaying properties that could bias allele frequency estimations.22, 32, 33 The green/red ratio tends to systematically differ between different arrays and array strips.32, 33 Calibration of the raw intensity data was performed on a strip-by-strip basis by rescaling the red bead score signal to achieve a mean pooling allele frequency (PAF) value of 0.5 for all SNPs on a given strip. The PAF values were computed as the scaled red intensity value divided by the total corrected red plus green intensity value. SNPs for which the mean PAF value was supported by less than four individual beads on a chip (for any given sample) were excluded from further processing. Altogether, on the basis of PAF, we excluded 495 884 SNPs, which represents 11.5% of all SNPs on the microarray. Normalized PAF values were then used for principal component analysis, as implemented in scikit-learn (http://scikit-learn.org), to confirm the clustering of replicate samples. Mean pooling variances and statistical tests of nine comparisons between IR+ and IR− pools are shown in Supplementary Table 1.

Statistical analysis of GWAS data

A linear model-based approach22, 32, 33 was applied using a set of PAF estimates for the IR+ and IR− group comparisons. For each pair of test and control samples, a binominal sampling variance was calculated as V using this formula: in which PAFt and PAFc were the pooling allele frequencies for the test and control samples, respectively, and n was the number of SNPs in a sample. Pooling variance related to the construction of pools from non-identical samples (for example, test and control pools) was estimated as var (epooling-2) according to this formula33: . Pooling variance was calculated for each case–control pair of technical replicates separately and then averaged across all comparisons. The estimated pooling variances were used in χ2 test as previously described22: . Prioritization of SNPs that differed significantly between groups was performed using a modification of the sliding window method, as described previously.5

Individual genotyping and replication

GWAS results were evaluated by two independent genotyping assays, the high-resolution melting (HRM) analysis and TaqMan test. HRM analysis was performed using MeltDoctor master mix (ThermoFisher Scientific, Waltham, MA, USA) and appropriate oligonucleotide primers (Eurofins Scientific, Luxembourg, Luxembourg) on a 7500 Fast Real-Time PCR System (ThermoFisher). For segment sequencing, a 3500 Genetic Analyser (ThermoFisher) was used. Genotyping using TaqMan assays (ThermoFisher) was also conducted on a 7500 Fast Real-Time PCR System.

Statistical analysis of SNP genotyping data

Fisher’s exact test for 2 × 2 contingency tables was used to analyse data obtained from SNP genotyping. Online tool VassarStats (Vassar College, Poughkeepsie, NY, USA) enables calculation of P-values, odds ratios (ORs) and 95% confidence intervals (95% CIs). Based on these calculations, the allele distribution between two cohorts could be evaluated. The post hoc statistical power (1−β) of Fisher’s exact test for an α of 0.05 was calculated with the G*Power calculator v3.1.34 A P-value<0.05 and statistical power >50% and a P-value<0.05 and statistical power >80% were criteria to include SNP in further analysis. Following the Cochran–Armitage trend test, the combined set of P-values by recessive, dominant and additive model were analysed for false discovery rate (FDR) using the two-stage Benjamini, Krieger and Yekutieli FDR procedure.35 The q-value was optimized in such a way that a set of »discoveries« did not include any potential false positive result (q=0,11), consequently only the P-values below a threshold of 0.0239 were considered as statistically significant results.

Results

Genome-wide association analysis of pooled DNA for identification of novel IR loci in paediatric Slovenian population

We employed a pooled-sample GWAS instead of GWAS on individual DNAs as a cost-effective and feasible method for analysing smaller populations to reduce genetic variability. The underlying hypothesis was that the IR+ DNA pool (participants with IR) would contain more susceptibility alleles for IR than the IR− DNA pool (participants without IR), which could be detected as a difference in allele frequencies or in relative allele signal intensities.

In the first step, allele frequency estimates were computed from the raw pooled data of the HumanOmni5-Quad (v1.0) SNP array scans. Following extensive calibration, the mean PAF was calculated and the normalized PAF values were employed in a principal component analysis. Figure 1 shows the principal component analysis plot of the distribution of three IR+ pools and three IR− pools. The three replicates within each group are highly correlated, suggesting low technical variability and, hence, validity of the experimental procedures. The amount of variance explained by the first two component, PC1 and PC2, is 42% and 16%, respectively. Given this relatively low amount of explained variance in the PC2 compared with the PC1 component, the spread along the y axis alone is not indicative of the high within-group sample variability. IR+ and IR− clusters are far apart indicating a low correlation between the two groups and, therefore the existence of true global genetic differences (biological variance).

Figure 1
figure 1

Principal component (PC) analysis plot of distribution of three IR+ pools and three IR− pools. The amount of variance explained by the first two component, PC1 and PC2, is 42% and 16%, respectively. As the percentage of explained variance in the PC2 compared with the PC1 component is low, the spread along the y axis alone does not indicate the high within-group sample variability (that is, low technical variance).As IR+ and IR− clusters are far apart, this indicates true global genetic differences (biological variance) between the groups.

Following computation of the pooling variance, SNPs were ranked by increasing P-values derived from the χ2 test results. For each SNP, a mean rank in a sliding window of 10 consecutive neighbouring SNPs was calculated (Supplementary Table 1). The sliding window method was used to identify regions that show consistent differences between allele frequencies of SNPs in the case and control groups. The obtained SNP mean rank values were normalized (divided by the number of all tested SNPs) and −log10 transformed before plotting. Supplementary Figure 1 shows Manhattan plots of the top five significant SNPs: rs212540 at Chr1:21266624, rs3218888 at Chr2:102014739, rs252111 at Chr5:142005685, rs9261108 at Chr6:30007810, and rs2258617 at Chr20:25274318. Genomic coordinates for these and other SNPs are displayed along the x axis, with the negative logarithm of the SNP’s association P-value on the y axis. The strongest associations have the smallest P-values and hence their negative logarithms are the greatest. The most significant SNPs in the pooled-DNA GWAS analysis were then evaluated in the next step.

Evaluation of top SNPs from pooled-DNA GWAS analysis on individual DNA samples of the first cohort

The five candidate SNPs (Supplementary Figure 1) identified during GWAS analysis of pooled IR+ and IR− DNA cohorts were selected for re-evaluation on individual DNA samples (first cohort) constituting the DNA pools: 98 children and adolescents from the IR+ group and 100 from the IR− group. To achieve high accuracy and efficiency of genotyping, two different scoring approaches were employed, HRM analysis and TaqMan assay. Table 2A displays the SNP genomic coordinates, associated genes and allelic ratios from the global HapMap project,36 European 1000 genomes37 and allele frequency ratios from our study. Allelic ratios for all five SNPs from our study are very similar to HapMap and especially to European values, which is expected given the geographical location of the studied Slovenian paediatric population. Table 2B displays minor allelic frequencies in each individual pool with averages and s.e. given separately for IR+, IR− and combined IR+ and IR− populations. Minor allelic frequencies do not deviate much between replicates within IR+ or IR− pools, suggesting low technical variability in our array experiment. However, the mean minor allelic frequencies between the IR+ and IR− pools (Table 2B) become apparent, suggesting that true genetic differences between the IR+ and IR− pools exist.

Table 2A Top five candidate SNPs from GWAS-pool analysis on individual samples that composed the pools
Table 2B Top five candidate SNPs from GWAS-pool analysis on individual samples that composed the pools

Following HRM and TaqMan genotyping, we performed statistical analyses for significant differences between the IR+ and IR− groups for each SNP. Results based on HRM identified statistically significant differences between analysed cohorts for a dominant inheritance model for rs212540 and rs252111 (rs212540: P-value=0.010, (1−β)=0.742, OR=2.315, 95% CI=1.258–4.260; rs252111: P-value=0.014, (1−β)=0.670, OR=0.421, 95% CI=0.217–0.816) and a recessive inheritance model for rs3218888 and rs2258617 (rs3218888: P-value=0.014, (1−β)=0.670, OR=2.500, 95% CI=1.216–5.141; rs2258617: P-value=0.006, (1−β)=0.803, OR=2.795, 95% CI=1.349–5.790). Analysis of rs9261108 did not return statistically significant results and was thus eliminated from further analysis.

The four SNPs exhibiting significant results after HRM genotyping of individuals comprising the GWAS pools were re-genotyped on individual DNA samples using TaqMan probe assays (Table 3) to verify the results obtained by the HRM method. Table 3 presents results of testing the dominant, additive and recessive models of inheritance and provides P-values, ORs and 95% CIs. A two-stage Benjamini, Krieger and Yekutieli FDR procedure for FDR was used and P-values less than a threshold value of 0.0239 were considered as significant. For SNP rs2258617, a recessive and additive models showed significant associations in the first cohort. SNP rs212540 was significant for a recessive model. For SNP rs252111, we cannot exclude dominant or an additive model though an additive model fits the data best (P>0.0062). SNP rs3218888 reached suggestive significance (P>0.0397) for a recessive model.

Table 3 Statistical analysis for each SNP after HRM and TaqMan genotyping of individual DNAs from the first cohort of IR+ and IR− groups

Locus zoom plots for these four SNPs are shown in Figure 2. Top significant SNPs lie in a relatively narrow regions on chromosomes 1, 2, 5 and 20. A closer examination of these regions tagged by these SNPs revealed that they were located within genes ECE1, IL1R2, GNPDA1 and PYGB that have not yet been associated with IR but have roles and functions that can be associated with IR (see Discussion section).

Figure 2
figure 2

Locus zoom plots for genome-wide significant IR loci that were replicated in individual genotyping of the first cohort. Locus zoom plots are shown for regions with top SNPs within the genes ECE on chromosome 1 (a), IL1RA on chromosome 2 (b), GNPDA1 on chromosome 5 (c), and PYGB on chromosome 20 (d). Top candidate IR genes are shown on the top of the panel with the most significant SNP indicated within the plot. Closely linked genetic map with the chromosomal physical map are shown on the x axis. The unbroken blue line indicates the recombination rate within the region (right y axis). Each filled circle represents the log10 P-value (left y axis), with the top SNPs in red, and other SNPs in the vicinity are coloured based on their degree of correlation (r2) with the top SNP.

Replication of top SNPs in a second independent cohort and in a combined analysis of the first and second cohorts

Although our pooled-DNA GWAS results were confirmed by genotype analysis of individual DNA samples from IR+ and IR− cohorts, we aimed at testing the four statistically significant SNPs for replication in an additional independent cohorts of IR+ and IR− obese children and adolescents (Table 1) and in a combined analysis of the first and second cohorts (Table 4). For this replication study, a TaqMan genotyping assay was employed. The participants of the second cohort, who were not related to the members of the pooled-DNA GWAS constituting the first cohort, were divided into an IR+ group (39 boys, 40 girls; mean age=13.9±2.6 years, mean BMI-SDS=2.9±0.5) and an IR− group (38 boys, 40 girls; mean age=13.8±2.9 years, mean BMI-SDS=2.8±0.4) using criteria identical to that used for the pooled-DNA GWAS cohort. Analysis in this second cohort showed suggestive significant differences between groups (Table 4) for rs2258617 located in the PYGB gene using the recessive inheritance model (P-value=0.039, (1−β)=0.533, OR=2.330, 95% CI=1.085–5.003). The other three SNPs failed to return statistically significant results in the second independent cohort (data not shown). Additionally, analysis of all four SNPs was performed on data from a merged first and second cohorts. This combined analysis included 177 obese children and adolescents in the IR+ group (87 boys, 90 girls; mean age=13.9±2.6 years, mean BMI-SDS=3.0±0.5) and 178 obese children and adolescents (88 boys, 90 girls; mean age=13.1±2.9 years, mean BMI-SDS=2.8±0.4) in the IR− group. The SNP rs2258617 within PYGB again remained statistically significant for both recessive and additive models and associations were stronger than the association in either of the two cohorts individually (Table 4).

Table 4 Statistical analysis for rs2258617 after TaqMan assay performed on original and independent cohorts and merged data from both cohorts

Discussion

Since its first theoretical studies20, 38 and experimental tests,22, 32 GWAS analysis using DNA-pooling methodology has proven to be a very time- and cost-effective strategy compared with larger-scale conventional GWAS requiring individual genotyping of the entire study population. Using pooled-DNA GWAS, both known and novel genetic variants have been identified in various diseases or traits.39, 40 Of relevance to our study, the pooled-DNA GWAS approach has been successfully used in studies including smaller numbers of participants.41, 42, 43 Altogether, the aim of our study was to determine whether GWAS using DNA-pooling methodology could identify genetic variants associated with IR in obese children and adolescents.

Some already known but also novel IR-related loci have been identified in our study following our GWAS-pool analyses. All loci, calculated as sliding window of 10 consecutive neighbouring SNPs, that surpassed statistical significance threshold are shown in Supplementary Table 1. Our single-nucleotide variant rs2237447 (chr7:50640147) maps to the GRB10 gene very close to the GRB10 SNP rs10248619 that was significantly associated with fasting glycaemic traits and IR in a GWAS study of Manning et al.15 Additionally, another risk allele at rs2237457 was shown to be associated with T2D and glucose excursion during oral glucose tolerance test in the Old Order Amish Study.44 GRB10 interacts with insulin receptors and inhibits their signalling45 and hence is functionally well connected to the traits in our study. Another single-nucleotide variant rs227070 from our study, is located closely to rs11212617, which was identified as T2D-related locus with both variants mapping to intronic regions of the ATM gene.46 In this GWAS study examining glycaemic response to metformin in T2D, common variants within the ATM gene were reported. ATM has been known to cause Ataxia Telangiectasia (A-T; OMIM no. 208900), which is a neurodegenerative disorder but patients also develop marked IR and have increased risk of diabetes.47 Additionally, loss-of-function mutation of Atm in mice leads to diabetes.48 As the third gene, RNF14, which was significant in our pooled-sample GWAS analysis, was significant in a GWAS study of associations with amyotrophic lateral sclerosis.49 Impaired glucose tolerance in patients with amyotrophic lateral sclerosis has been documented decades ago,50 which was confirmed also in several follow-up studies.51, 52 Apart from comparing SNP gene-based hits from our list of statistically significant SNPs (Supplementary Table 1), we also compared locations of closely linked regions around our SNPs with studies not reported in the GWAS catalogue. In a recent exome-chip study of genetic variants on diabetes-related metabolic traits,53 rs272893 was found significant that is located in SLC22A4, a gene found associated with T2D already in previous GWAS studies and closely linked to our significant SNP rs2522052 (Supplementary Table 1). For further detailed analyses on individual analyses in two independent cohorts, we have chosen the five top SNPs because they have not been previously described and because they showed the highest mean rank values. However, the significant candidate SNPs in Supplementary Table 1 present potential new genetic variants to be explored further especially because some of the IR loci detected here in a paediatric population might not show up in similar studies in the adults.

Candidate SNPs after pooled-DNA GWAS analysis of the first cohort

The pooled-DNA GWAS analysis was performed on pooled samples divided into IR+ and IR− cohorts. Three technical replicates per cohort were used to minimize pooling errors, and the SNP-chip Ilumina HumanOmni5-Quad v1.0 platform was used, which was previously shown to be robust and able to extract maximal available information from pooled DNA.54 Additionally, the pooling study design was specifically chosen to reduce further variability not attributable to biological variance of the IR+ group versus IR− group. Individuals constituting the IR+ and IR− groups were carefully selected by matching them for gender, age and BMI-SDS; we consider this to be an important strength of our study. As IR increases physiologically during puberty5 and with the degree of overweight,55 our matching strategy should have decreased the probability of detecting SNPs associated with the confounding effects of gender, age and degree of obesity instead of IR status.

Five candidate SNPs with the highest statistical significance scores were identified in the pooled-DNA GWAS analysis of the first cohort. None of these five SNPs have been previously associated with IR or any other traits according to the GWAS catalogue database (accessed on 2 February 2017). Four SNPs (rs212540, rs3218888, rs252111 and rs2258617) remained significant after HRM analysis of individual genotypes. These four SNPs were also re-genotyped with the TaqMan assay, as this is more accurate than HRM. Although eventually only one SNP (rs2258617) withstood two further, stringent verification steps, the four significant SNPs after the first-phase pooled-DNA GWAS analysis nevertheless warrant some discussion.

The rs212540 SNP is located in an intron region of the endothelin converting enzyme 1 (ECE1) and has been associated with cardiovascular complications of diabetes,56 as well as with adult human height57 and childhood obesity-related traits in a Hispanic population.14 The rs3218888 SNP is located in an intron of the interleukin 1 receptor type 2 (IL1R2). This gene is from a family of interleukins that are frequently linked to causes of obesity-associated complications.58 The rs252111 SNP that is located in glucosamine-6-phosphate deaminase 1 (GNPDA1) has an important housekeeping function in carbohydrate derivative metabolism (Gene Ontology database). In addition, its important paralogue, GNPDA2, has been significantly associated with the risk of IR in Chinese children.18 SNP rs9261108 is located in the 6p22 region where HLA-J, ZNRD1-AS1 and RNF39 overlap. HLA-J is a pseudogene of HLA-A,59 a gene associated with Graves’ disease,60 an autoimmune-metabolic disorder of the thyroid gland while RNF39 SNP has recently been associated with non-obstructive coronary artery disease.61

Although four out of five significant SNPs in our pooled GWAS analysis continued to be significant in the follow-up validation analysis of individual DNA samples, only SNP rs2258617 within PYGB remained significant in the independent second cohort and the merged first and second cohort analysis. The level of association of all four SNPs diminished following the second-phase validation step. Possible explanations for the reduced association in the second cohort may be pool-based or array-based experimental errors or variation in allele frequency because of the relatively small pool sample size. However, we surmise that a major factor may be the relatively small size of the second cohort. Given the high level of statistical significance in the first cohort analysis and the potential functional relevance of the associated genes or regions as discussed above, our results for the aforementioned four SNPs justify further analyses in larger cohorts of Slovenian or other populations.

rs2258617 (PYGB) is the strongest novel candidate IR SNP

The most robust result of our study was identification of a candidate SNP rs2258617 located in an intron of the glycogen phosphorylase, brain form (PYGB) at the 20p11 region. This SNP, as well as neighbouring SNPs, gave high significant values in our pooled-DNA GWAS analysis (Figure 2, Supplementary Table 1). This association was also validated in the same cohort through individual genotyping and confirmed in an independent replication cohort of IR+ and IR− obese children and children and adolescents in Slovenia (Table 4). Statistical analysis for rs2258617 was additionally performed on merged data from the first and second cohorts. Higher statistical values were obtained with this larger population, indicating that our study indeed identified a strong candidate region associated with an increased causal likelihood for IR (Table 4).

SNP rs2258617 resides within the PYGB gene,62 coding the enzyme that catalyses the rate-determining step in glycogen degradation by releasing glucose-1-phosphate from a terminal alpha-1,4-glycosidic bond. This enzyme thus has a key role in glucose homeostasis. Its activity is regulated allosterically and by reversible phosphorylation.62 Mammals have three isozymes of glycogen phosphorylase: liver, muscle, and brain. Liver and muscle isozymes ensure a steady supply of energy to the liver and skeletal muscles, respectively. The brain form is responsible for ensuring glucose supply to the brain, especially under stressful conditions.63 Although the name implies specificity for brain tissues, several transcriptome studies clearly demonstrate its expression and possible function in several other tissues, with some tissues (for example, epithelial cells, thyroid, heart, colon) exhibiting even higher expression than in the brain.64

In the GWAS catalogue database, no results were found for the rs2258617 SNP, thereby suggesting that we potentially identified a novel candidate gene for IR. However, one GWAS65 aimed at identifying genetic variants for serum calcium concentrations found a hit for a closely linked SNP within the PYGB gene. The effects of serum calcium levels on insulin release were established decades ago,66 and subsequent population studies confirmed that perturbed calcium homeostasis correlates with abnormalities of fasting serum glucose, IR and pancreatic beta-cell function.67 Such studies indicate that PYGB very likely has functional relevance in IR, possibly through its actions in other tissues besides the brain. Further support for this comes from tissue and developmental stage-specific data collated in the mouse. The mouse studies demonstrate that expression of Pygb is high during embryogenesis and in the adult nervous, visceral, endocrine, liver and biliary systems. One other study also found significant differential protein expression of Pygb in T2D mice treated with rapamycin for cardiac dysfunction.68 Moreover, in an in vitro study of pancreatic cancer cells, inhibiting PYGB increased the sensitivity of cells to glucose starvation, partially explaining the manner in which glucose is restricted in tumour cells.69 Although the PYGB gene has not been comprehensively studied, especially in dedicated analyses of insulin and glucose homeostasis, the above studies that collaterally found associations between PYGB and glucose metabolism and IR imply that PYGB may act as a pleotropic gene that is not necessarily connected with a brain-specific function, despite its name.

Conclusions

Using pooled-DNA GWAS analyses, we identified five SNPs and corresponding genes significantly associated with IR in a population of obese children and adolescents: rs212540 (ECE1), rs3218888 (IL1R2), rs252111 (GNPDA1), rs9261108 (HLA-J), and rs2258617 (PYGB). Significant associations were validated for four SNPs (rs212540, rs3218888, rs252111 and rs2258617) on follow-up analyses of individual DNA samples, whereas rs2258617 (PYGB) continued to be significant in an independent cohort and in a merged analysis of the first and second cohorts. To our knowledge, the five SNPs from the pooled-DNA GWAS analysis have not been previously reported in GWAS studies as being associated with IR or related traits. For these five regions, and especially the four that were validated in a replicative individual DNA analyses, it would be of interest to further investigate their possible association with IR in genetic studies of larger cohorts and other functional studies.

The main result of our study is the identification of rs2258617 in the PYGB gene as being associated with significant differences in frequencies of alleles between the IR+ and IR− groups. This SNP was significant in the pooled-DNA GWAS analysis, validation analyses of individual genotypes, replication study in an independent cohort and merged analysis of the first and second cohorts. A recessive or additive mode of inheritance was supported by a high OR and low P-value. As the HapMap and European population frequencies show very close to intermediate frequencies for the C:T SNP rs2258617, such frequencies are likely expected also in the background (Slovenian) population. In the large first cohort, allele C was much more frequent in the IR− group (60%) than in the IR+ group (48%). This suggests that this SNP is a common and frequent allele in the population, which can potentially serve as an informative diagnostic genetic marker for early detection of IR in obese children and adolescents. Much research remains to be conducted to explore the mechanism by which PYGB genetic variants affect IR, which could lead to the development of novel preventative or therapeutic strategies to combat IR. This, in turn, offers the prospect of personalizing treatment based on genotype and opens a route for exploring novel drug treatment opportunities.

In conclusion, we report for the first time a pooled-DNA GWAS analysis of IR and insulin-sensitive obese children and adolescents in the Slovenian population identifying five significant SNPs or genes. Strongest support in validation and replication studies was found for the rs2258617 SNP, suggesting that the PYGB gene may be involved in the genetic control of IR and thereby providing a new target for further basic research of the mechanisms underlying IR and for the development of potential new therapies for IR and T2D.