Blood-based analysis of type-2 diabetes mellitus susceptibility genes identifies specific transcript variants with deregulated expression and association with disease risk

Despite significant progress by genome-wide association studies, the ability of genetic variants to conduce to the prediction or prognosis of type-2 diabetes (T2D) is weak. Expression analysis of the corresponding genes may suggest possible links between single-nucleotide polymorphisms and T2D phenotype and/or risk. Herein, we investigated the expression patterns of 24 T2D-susceptibility genes, and their individual transcript variants (tv), in peripheral blood of T2D patients and controls (CTs), applying RNA-seq and real-time qPCR methodologies, and explore possible associations with disease features. Our data revealed the deregulation of certain transcripts in T2D patients. Among them, the down-regulation of CAPN10 tv3 was confirmed as an independent predictor for T2D. In patients, increased expression of CDK5 tv2, CDKN2A tv3 or THADA tv5 correlated positively with serum insulin levels, of CDK5 tv1 positively with % HbA1c levels, while in controls, elevated levels of TSPAN8 were associated positively with the presence of T2D family history. Herein, a T2D-specific expression profile of specific transcripts of disease-susceptibility genes is for the first time described in human peripheral blood. Large-scale studies are needed to evaluate the potential of these molecules to serve as disease biomarkers.

Patients and samples. Peripheral blood samples were collected from 48 consecutive T2D patients and 40 control (CT) individuals (with normal glucose metabolism), upon informed written consent. The study was approved by the Ethics Committee of the Attikon Hospital (Athens, Greece). Both groups were characterized according to the current criteria for T2D diagnosis 18 . The medical records of the participants were evaluated for various clinical, laboratory, and therapeutic variables. The group of controls consisted of two distinct subgroups: controls without risk factors for the development of T2D (CT RF− ; n = 17) and controls bearing risk factors for the disease (CT RF+ ; n = 23), as these were previously described by Nathan 2 (Table 2). RNA extraction, RT-qPCR and RNA-seq. All the methods used for gene-expression analysis, were applied on RNA extracted from whole peripheral blood using direct-blood lysis. For materials and protocols applied for RNA extraction, reverse transcription, and qPCR, as well as cDNA library construction and RNA-seq, see Supplemental Fig. 1. Specific primers designed for the amplification of the genes-of-interest, or certain transcript variants of them are reported in Supplemental Table 1. Relative quantification (RQ) of gene expression was performed by the 2 −ΔΔCt method, using the hypoxanthine phosphoribosyltransferase 1 (HPRT1) gene, as endogenous reference gene for normalization purposes, and the immortalized 1.2B4 pancreatic beta-cell line (ECACC, Salisbury, UK), as our assay calibrator, for the calculation of the fold-changes. Representative amplification, melting and standard curves for certain genes/transcript variants are indicatively presented in Supplemental Fig. 2. Bioinformatics analysis. For analysis of RNA-seq raw data see Supplemental Fig. 1. Differential expression between the two groups was considered significant if fold-change of their average RPKMs (CT:T2D ratio) was <0.5 or >2. Area-proportional Euler diagrams were generated using the BioVenn tool (http://www.biovenn.nl/). Analysis of tissue-specific expression patterns of genes and transcript variants and expression quantitative trait loci (eQTLs) was performed utilizing the portal of the Genotype-Tissue Expression (GTEx) project 19 and the Blood eQTL browser 20 . Statistical analysis. Differential expression patterns between CT and T2D, or among CT RF− , CT RF+ , and T2D individuals, were explored using the non-parametric Mann-Whitney U or Jonckheere-Terpstra tests, respectively. Benjamini-Hochberg procedures for adjusting the false discovery rate (FDR = 0.25) in multiple comparisons were also applied. Possible associations with binary, ordinal or continuous values of various clinicopathological and laboratory parameters were investigated by Mann-Whitney U, Jonckheere-Terpstra, or Spearman's rank correlation coefficient tests, respectively. Binomial multivariate logistic regression analysis was performed (enter model) using the occurrence of T2D as the dependent variable and the expression levels of genes and transcript variants, age and sex as independent variables. Analyses were performed using the softwares: GraphPad Prism 5 (GraphPad Software Inc., San Diego, CA, USA) or IBM SPSS Statistics 21 (IBM Corp., Armonk, NY, USA). P-values < 0.05 were considered significant.

Statement of Ethical Approval and Informed Consent. The study was approved by the Ethics
Committee of the Attikon Hospital (Athens, Greece). All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Informed consent was obtained from all patients for being included in the study.

Results
Differential expression of certain T2D-susceptibility genes in patients versus controls. Firstly, specifically designed qPCR protocols applied on RNA extracted from whole peripheral blood samples, detected quantifiable expression levels in the cases of the following 20 genes : CAMK1D, CAPN10, CDC123, CDK5,  CDKAL1, CDKN2A, CDKN2B, FTO, HHEX, IGF2BP2, JAZF1, KCNJ11, KCNQ1, NOTCH2, PPARG, SLC30A8,  TCF7L2, THADA, TSPAN8 and WFS1. On the contrary, mRNA expression was not detected in samples of either patients or controls, in the cases of ADAMTS9, HNF1B, HNF4A and MTNR1B genes. Relative quantification (RQ) values (median; range) in the groups of T2D patients (n = 48) and controls (n = 40) are summarized in Table 3 (Table 3 and Fig. 1A; upper row). Further analysis within the group of CTs, revealed that CT RF+ individuals (n = 23) were characterized by elevated levels of the three abovementioned genes compared to CT RF− (n = 17) ones (Table 3 Table 3. After applying correction for multiple comparisons, the differential expression patterns that remained significant were those of: CDK5 and TSPAN8 between CT and T2D groups, and of CDK5, CDKN2A and TSPAN8 among the CT RF− , CT RF+ and T2D groups.

RNA-Seq analysis.
Following the quantification of the expression levels of the abovementioned 20 genes, we studied also the expression patterns of their individual transcript variants in order to: (i) detect the specific transcript variant(s) responsible for the aforesaid differences and (ii) reveal any possible "hidden" differences in the levels of specific variants of the rest 17 genes, in patients versus controls. For that reason, data from RNA-Seq performed on peripheral blood samples of representative T2D patients (n = 4) and controls (n = 2) were analyzed appropriately.
Focusing on the 24 genes-of-interest, T2D patients were found to express: (i) higher levels of the genes  Table 2), compared to controls.

Laboratory
HbA1c levels (% or mmol/ml); median (range) 5.6 (4.9-6.1) 6.6 (5.  Table 2. Characteristics of control individuals (CT) and patients (T2D) included in the study. * Risk factors associated with higher risk of T2D, included: (i) BMI >25, (ii) prior history of gestational diabetes, (iii) hypertension, (iv) dyslipidemia, (v) cardiovascular disease, or vi) first-degree family member with T2D 2 . † BMI was calculated as weight (kg) divided by the square of height (m 2 ). ‡ Central obesity was regarded if waist circumference was ≥102 cm (40 in) in men or ≥88 cm (35 in) in women. § Hypertension was regarded if blood pressure was ≥130/85 mm Hg (or receiving drug therapy for hypertension); ‖ Hyperlipidemia (defined by the Adult Treatment Panel III of the National Cholesterol Education Program 53 . ¶ Metabolic syndrome was diagnosed according to the NCEP-ATP III report 54 requiring at least 3 of the following 5 conditions: (i) fasting glucose ≥100 mg/dL (or receiving drug therapy for hyperglycemia), (ii) blood pressure ≥130/85 mm Hg (or receiving drug therapy for hypertension), (iii) triglycerides ≥150 mg/dL (or receiving drug therapy for hypertriglyceridemia), (iii) HDL-C <40 mg/dL in men or <50 mg/dL in women (or receiving drug therapy for reduced HDL-C), (iv) waist circumference ≥102 cm (40 in) in men or ≥88 cm (35 in) in women. NA: not applicable.
Based on the above findings, the panel of the T2D-specific transcript variants finally included the: CAPN10 tv3, CDK5 tv1, CDK5 tv2, CDKN2A tv3, CDKN2A tv4, IGF2BP2 tv7, KCNQ1 tv1, THADA tv5 and TSPAN8. Among them, binomial multivariate analysis corrected for age and sex revealed that CAPN10 tv3 can predict T2D among participants of the current study (p = 0.022, OR = 0.726). A schematic representation of the T2D-specific transcript variants, also in comparison with the canonical transcript for each gene, is shown in Fig. 2. Associations of mRNA levels with clinicopathological data. The levels of the T2D-specific transcripts were found to associate with various clinicopathological parameters (Supplemental Table 3). Notably, the associations revealed were different in patient versus control groups.
In detail, CDK5 tv1 levels correlated positively with serum insulin (μU/ml) and glycated haemoglobin (HbA1c; % or mmol/mol) levels and negatively with hyperlipidemia in T2D patients, while positively with serum triglycerides levels (mg/dl) (p < 0.05) in CT RF+ subjects. As for the CDK5 tv2 levels, these also correlated strongly with serum insulin levels, and moreover with the presence of central obesity in the T2D group (p < 0.05).
The levels of CDKN2A tv3 were as well significantly associated with serum insulin levels in T2D patients (p < 0.01), while these of CDKN2A tv4 correlated positively with BMI and waist-to-hip ratio and negatively with serum HDL levels (mg/dl) in the CT RF+ subgroup (p < 0.05).
Furthermore, correction for multiple comparisons confirmed certain of the above correlations: a) In T2D subjects, serum insulin levels were associated with the levels of CDK5 tv1, CDK5 tv2, CDKN2A tv3, KCNQ1 tv1, THADA tv5 and TSPAN8, hyperlipidemia was associated with the levels of CDK5 tv1 and THADA tv5, and BMI with the levels of IGF2BP2 tv7; b) In control individuals, serum insulin levels were correlated with the levels of IGF2BP2 tv7, serum HDL levels, BMI, and waist-to-hip ratio with the levels of CDKN2A tv4, and family history of T2D with the levels of TSPAN8.
Various associations were also detected between clinicopathological data and the levels of transcripts which did not exhibit any differential distribution among the groups of patients and controls (data not shown).
Analysis of the tissue-specific expression pattern and eQTLs of the differentially expressed genes using public available datasets. Based on public available data of the GTEx portal 19 , the CAPN10, CDK5, CDKN2A, IGF2BP2, KCNQ1, THADA and TSPAN8 genes, as well as their transcript variants CDK5 tv1, CDK5 tv2, IGF2BP2 tv7, KCNQ1 tv1 and THADA tv5, are expressed in a series of human tissues including blood and T2D-target tissues (adipose tissue, liver, skeletal muscle, pancreas) (Fig. 3). No data were available for CAPN10 tv3, CDKN2A tv3 and tv4.
Blood eQTL browser and GTEx portal were used for the analysis of eQTLs in the differentially expressed genes. For each one, a significant number of eQTLs appeared; however, we focused on eQTLs linked to T2D-related SNPs in blood or T2D-target tissues (Supplemental Table 4). Significant eQTLs related to T2D-SNPs were found on CAPN10, CDKN2A, IGF2BP2 and KCNQ1 in blood cells, and on CAPN10 and TSPAN8 in skeletal muscle. Affected genes are reported in Supplemental Table 4.
Also, CAPN10, the most significantly differentiated gene herein, is suggested to be an eGene (that is a gene having at least one cis-eQTL acting upon it), affected by several eQTLs on GPR35, RNPEPL1, and/or itself. Nevertheless, only the rs5030952-eQTL has a known (GWAS) association with T2D development (Supplemental Table 5).

Discussion
Despite numerous GWAS and other clinical studies revealing a large pool of SNPs associated with T2D, none of them has been yet proven promising for its diagnosis and/or prognosis 5 . Moreover, their causal relationship with T2D pathogenesis is not well-defined; epigenetic mechanisms can influence the expression of T2D-susceptibility genes, while DNA sequence itself alters the way the epigenome exerts its regulatory effect [21][22][23][24][25] . Furthermore, eQTLs associated with T2D-genetic variants may be also involved 26 . Herein, the expression profile of a panel of T2D-susceptibility genes, with special focus on their individual transcripts, was investigated in peripheral blood of T2D patients and controls, utilizing a combination of high-throughput and highly-sensitive molecular methodologies (RNA-Seq and qPCR). Our data revealed a T2D-specific pattern of expression of nine transcript variants of the genes: CAPN10, CDK5, CDKN2A, IGF2BP2, KCNQ1, THADA and TSPAN8. Compared to controls, patients exhibited down-regulated levels of the tv3 of CAPN10 and tv1 of KCNQ1, while up-regulated levels of CDK5 tv1 and tv2, CDKN2A tv3 and tv4, IGF2BP2 tv7, THADA tv5 and TSPAN8. Publicly available datasets suggest that many human tissues including peripheral blood and T2D-target tissues express the abovementioned genes, allowing to postulate that the T2D-specific expression pattern found herein, may reflect pathogenetic mechanisms in disease-affected organs and/or peripheral blood 14,15,19,20 .
Among the down-regulated molecules is the tv3 of CAPN10. The gene encodes a protein implicated in glucose transporter 4 (GLUT4) translocation, insulin secretion, and apoptotic processes in pancreatic cells 27 . Compared to the canonical variant 1, tv3 lacks two consecutive exons, resulting in the loss of an in-frame segment in the 3' coding region, and the encoded isoform (c) is shorter than isoform a (Fig. 2). The gene bears the rs3792267 and rs5030952 T2D-related SNPs 3,16 ; however, tv3 does not bear any of them. Therefore, the decreased expression levels observed in patients, might be due to epigenetic and/or other transcriptional regulations 24,25 . It is worth mentioning that, herein, CAPN10 tv3 exhibited the highest association with the disease among all the deregulated molecules, as attested by both univariate and multivariate analysis. Yet, it is the only transcript which showed no association with any of the clinicopathological parameters tested, possibly indicating the highly complicated molecular networks underlying T2D.
The levels of the canonical transcript (tv1) of KCNQ1 (Fig. 2) were also decreased in the T2D group, while they were lower in the CT RF+ compared to the CT RF− group. ΚCNQ1 encodes the pore-forming subunit of a voltage-gated K + channel (KvLQT1) that is essential for the repolarization phase of the action potential in cardiac muscle 28 . It is also expressed by pancreatic islets 29 , plays a key-role in the regulation of insulin secretion 30 and its genetic variants have been associated with impaired insulin secretion in humans 31,32 . In these terms, the decreased expression of KCNQ1 tv1 in the CT RF+ and T2D groups observed in our study, and the reverse correlation with serum insulin levels, may indeed reflect the negative regulation on insulin secretory function exerted by KCNQ1 in patients and pre-disposed individuals. Among the T2D-up-regulated molecules, there are both the two transcripts of CDK5 (Fig. 2). CDK5 is a serine/threonine protein kinase, involved in the degeneration of beta-cells and obstruction of insulin secretion through the generation of p35/CDK5 complexes 33 ; its inhibition has been shown to protect these cells from glucotoxicity 34 . The overactivity of CDK5 and its activator p35 have been as well correlated with neuronal dysfunction in patients with Alzheimer's disease (AD), and this could be one of the possible common mechanisms shared by these two degenerative disorders 34 . CDK5 is highly regulated by the T2D-susceptibility gene CDKAL1 17 , through the rs7756992 SNP of the latter, which increases the risk for T2D 35 . Our data showed decreased levels of CDK5 tv1 and 2 in T2D patients compared to controls; though they exhibited different distribution patterns and correlated with different clinicopathological parameters in the CT RF− versus CT RF+ groups, but both with increased serum insulin levels. This might be probably attributed to different transcriptional regulations and distinct pathogenetic mechanisms leading, however, to the same "pre-disease" phenotype. Moreover, it is reported that as tv1, the tv2 of CDK5, in which an in-frame coding exon is skipped 36 (Fig. 2), is also a negative regulator of Wnt/β-catenin signalling, a pathway involved in T2D development 37 . SNPs in the CDKN2A/B locus were recently implicated in the negative regulation of beta-cell mass, proliferation and insulin secretory function, as well as in metabolic processes in adipose tissue, liver and muscles 22 . Also, in human islets, this locus is affected by epigenetic factors 38 , however, no effect on gene expression is known 22 . CDKN2A/B variants affect also the risk for cardiovascular disease 39 and cancer 40 , and this could be a link for the common pathogenetic mechanisms shared with T2D 41 . Additionally, in blood, T2D-associated SNPs on CDKN2A are eQTLs which affect the expression of PSEN1 19 involved in AD and cancer 42,43 , connecting possibly these three morbidities. In this study, T2D patients expressed elevated levels of CDKN2A, and specifically of CDKN2A tv3 and 4: the first one highly associated with serum insulin levels in patients, and the second one with certain T2D-risk factors in controls. This may suggest their differential implication in disease development and/ or progress. However, tv3 contains an alternative open reading frame (Fig. 2) and it is specifically expressed in the pancreas 44 . CDKN2A tv4 has a distinct first, but shares a common second exon with the canonical tv1, translated in different reading frames (Fig. 2): the encoded protein (p14ARF) lacks sequence similarity to the classic isoform (p16INK4a), and it is known to be nucleoplasmic but also recruited to mitochondria 45 . These characteristics may suggest tv-specific functions, possibly implicated in disease's pathogenesis.
IGF2BP2 binds the 5′ UTR of the insulin-like growth factor 2 mRNA and regulates its translation 16 . Moreover, T2D-related SNPs on IGF2BP2 are eQTLs affecting SENP2, a gene crucially involved in adipogenesis and T2D development 46 . Herein, the levels of the tv7 of IGF2BP2 (which lacks exons 1 and 2 compared to the canonical tv1) exhibited a significant stepwise up-regulation from CT RF− to CT RF+ and to T2D individuals. Their correlation with BMI in patients, and serum glucose and insulin levels in CT RF+ cases, indicates its functional involvement in T2D pathogenesis.
Patients exhibited also elevated levels of TSPAN8 (only one known tv) and of THADA tv5 (with alternative 3′ coding region and 3′ UTR, encoding a shorter isoform (c) with a distinct C-terminus) (Fig. 2). The first gene is regarded as a prognostic factor and potential therapeutic target for certain human carcinomas [47][48][49][50] , while chromosomal aberrations of the second are observed in benign thyroid adenomas 51 . They both bear SNPs associated with T2D, though there is no knowledge regarding their involvement in its development 3,16 . Herein, the correlation of their levels with T2D, certain parameters and/or risk factors provides the first evidence for their possible implication in T2D pathogenesis.
The levels of TCF7L2, the most highly-related T2D-susceptibility gene, as well as of other T2D-susceptibility genes, were comparable between patients and controls; however, they correlated with certain disease characteristics or risk factors, supporting their implication in T2D development.
However, certain limitations of this study need to be considered: (a) the fact that not all the known T2D-susceptibilty genes are examined, (b) the relatively small number of participants and RNA-seq samples tested; the latter was overcome by the subsequent qPCR validation of the proposed deregulated transcript variants, discriminating between the true-and false-positive results, though, RNA-seq false-negative results could not been ruled out, (c) no paired-analysis of both the transcript levels and the presence of T2D-related SNPs was conducted; search throughout the Blood eQTL browser and the GTEx portal served only as a guide for their association and does not adequately explore the genetic determinants of the gene-expression variation, (d) the significant difference in the median age of patients and controls tested ( Table 2); age associates with epigenetic changes 52 , thus it cannot be excluded as possible factor influencing the gene-expression variations observed in our cohort. However, after binomial multivariate analysis corrected for age and sex, CAPN10 tv3 still remain capable to predict T2D.
Nevertheless, by analyzing the expression patterns of a panel of the most highly-associated T2D-susceptibility genes, the current study offers suggestive data on the deregulated levels of certain transcript variants. Future research is required to elucidate their involvement in principal molecular and biochemical networks underlying T2D pathogenesis. Also, large-scale perspective clinical studies are needed to evaluate their potential to serve as possible biomarkers for its diagnosis, prognosis and/or monitoring.

Data Availability
The RNA-seq and qPCR raw data used to support the findings of this study are available from the corresponding author upon request.