Genetic factors affect the susceptibility to bacterial infections in diabetes

Diabetes increases the risk of bacterial infections. We investigated whether common genetic variants associate with infection susceptibility in Finnish diabetic individuals. We performed genome-wide association studies and pathway analysis for bacterial infection frequency in Finnish adult diabetic individuals (FinnDiane Study; N = 5092, Diabetes Registry Vaasa; N = 4247) using national register data on antibiotic prescription purchases. Replication analyses were performed in a Swedish diabetic population (ANDIS; N = 9602) and in a Finnish non-diabetic population (FinnGen; N = 159,166). Genome-wide data indicated moderate but significant narrow-sense heritability for infection susceptibility (h2 = 16%, P = 0.02). Variants on chromosome 2 were associated with reduced infection susceptibility (rs62192851, P = 2.23 × 10–7). Homozygotic carriers of the rs62192851 effect allele (N = 44) had a 37% lower median annual antibiotic purchase rate, compared to homozygotic carriers of the reference allele (N = 4231): 0.38 [IQR 0.22–0.90] and 0.60 [0.30–1.20] respectively, P = 0.01). Variants rs6727834 and rs10188087, in linkage disequilibrium with rs62192851, replicated in the FinnGen-cohort (P < 0.05), but no variants replicated in the ANDIS-cohort. Pathway analysis suggested the IRAK1 mediated NF-κB activation through IKK complex recruitment-pathway to be a mediator of the phenotype. Common genetic variants on chromosome 2 may associate with reduced risk of bacterial infections in Finnish individuals with diabetes.

Infectious diseases have had a great impact on mortality throughout human evolutionary history. Genetic traits offering protection from or susceptibility to infections have heavily affected the survivability and genetic selectivity of populations 1,2 . Despite modern medical technologies, including vaccines and antimicrobial therapies, infections still have a massive effect on global morbidity and mortality. Based on the recent World Health Organization (WHO) report, infections are the fourth most common cause of death globally and the most common cause of death in low-income countries 3 .
Previous research has demonstrated how genome-wide association studies (GWAS) can effectively be implemented to uncover genetic factors that increase susceptibility to infections [4][5][6] . However, GWAS reports on susceptibility to infections, have none the less been scarce 7 and published studies in this field have mainly focused on certain specific infections or pathogens, such as HIV 8 , malaria 9,10 , hepatitis 11,12 and tuberculosis 10,13,14 . Perhaps the most comprehensive study performed on susceptibility to infections using GWAS, found several loci located in the HLA-region on chromosome 6 to be associated with the susceptibility to common infections such as the common cold, pneumonia, streptococcal pharyngitis and urinary tract infections, in individuals with European  15 . Furthermore, GWAS have also demonstrated how genetic factors can play an opposite, protective role, in the immunity against infectious agents 16 . Diabetes is a common disease with an increasing prevalence, at present affecting nearly 500 million people globally. Of note, several studies have demonstrated how diabetes increases the risk of infections [17][18][19] . Such increased susceptibility to infections may be mediated through defects in host immune defense mechanisms, including impaired neutrophil function 20,21 , which plays an essential role in the defense especially against bacterial pathogens. Indeed, we have previously found that individuals with type 1 diabetes are more susceptible to bacterial infections, and that individuals with type 1 diabetes have a roughly two times higher risk of contracting a bacterial infection compared to individuals without diabetes 22 . We also found that the risk of infections correlated with the stage of diabetic nephropathy, as well as with the glycemic control. Even though the increased risk of infections observed in diabetes is widely acknowledged, the underlying pathophysiological mechanisms are still largely unknown.
We hypothesized that a part of the increased susceptibility to bacterial infections in diabetes could be modulated by common genetic factors. In order to identify such factors, we utilized GWAS and national register data on antibiotic purchases from two separate, comprehensive Finnish diabetes study cohorts: The Finnish Diabetic Nephropathy Study (FinnDiane, type 1 diabetes) and the Vaasa Diabetes Registry (DIREVA, all types of diabetes).
The effect of the lead locus on infection susceptibility. Interestingly, we observed a negative effect size for the top SNP, rs62192851. The effect allele, with an allele frequency of 9.0% in both the FinnDiane and the DIREVA cohorts, was associated with a lower bacterial infection frequency, as opposed to the hypothesized increased susceptibility to infections. In the FinnDiane cohort, stratification according to the genotype of rs62192851 demonstrated markedly lower antibiotic purchase rates dose-dependently with increasing numbers of effect alleles (Fig. 3). Compared to homozygotic carriers of the reference alleles (N = 4231), homozy- Table 1. Clinical characteristics of the individuals in the FinnDiane and DIREVA-cohorts. Data is presented as means ± SD, median (IQR) or number (percentages) where appropriate. In the DIREVA-cohort, data on blood pressure was available for 3231 individuals. HbA 1c glycated hemoglobin, eGFR estimated glomerular filtration rate, LDL low-density lipoprotein, T1D type 1 diabetes.

Replication analysis.
We tested replication of all variants with P < 1 × 10 -5 in the Swedish ANDIS cohort (All New Diabetics In Scania) of subjects with either type 1 or type 2 diabetes and in the FinnGen cohort of nondiabetic individuals. The top SNP, rs62192851, discovered in the meta-analyses was not associated with the phenotype in the ANDIS (rs62192851 P = 0.60) or the FinnGen-cohort (P = 0.52, Table 3). However, in the FinnGencohort, two variants (rs6727834 and rs10188087) from the lead locus were nominally replicated (P = 0.03 and P = 0.04, respectively). and tested their association with infection susceptibility. Gene burden tests revealed two genes to be associated with the phenotype when restricted to missense mutation variants with any frequency: CRYGB (Crystallin Gamma B; P = 0.01) and RELL1 (RELT-like protein 1; P = 0.03). CRYGB is located on chromosome 2 close to the lead locus of the present study. RELL1 has been found to induce apoptosis of human epithelial cells through the activation of p38 MAPK pathway 23 . However, both CRYGB as well as RELL1 remained as suggestive findings after adjustment for multiple testing (P thresh = 0.0002, corrected for 208 genes within the 500 kb flanking region). No significant loss-of-function variants were found to associate with the phenotype. As the lead locus is within a non-coding region the WES data did not capture these variants.
In silico analysis. Expression quantitative trait loci (eQTL). The lead variant, rs62192851, as well as 18 other suggestive variants from the lead locus had been shown to affect the expression of CRYGD (Crystallin Gamma D) in heart tissue (Supplementary Table S1). CRYGD codes for crystallin proteins that make up the vertebrate lens, but how CRYGD could affect host defense mechanisms against pathogens is unclear.
Chromatin conformation capture data. Of the 21 SNPs from the lead locus with suggestive P-values, nine variants were on DNA fragments that interact with promoter regions of genes in open chromatin conformation capture data (Supplementary Table S1). Six of these variants were seen to bind to the promoter region of DYTN (Dystrophin) in immunologic cell lines and three interacted with the promoter regions of CCNYL1 (Cyclin-Y-Like Protein 1). The exact role of DYTN is uncertain, but the gene has previously been associated with nephrolithiasis and encephalitis. CCNYL1 has been associated with spermatogenesis but how the gene might affect infection susceptibility is unknown 24 .
Pathway analysis. We performed pathway analysis with the Pascal software to identify biological pathways enriched for association signal for infection susceptibility. Although no pathways were significant after correction for multiple testing, the most significant signaling pathway showed suggestive association and proved to be highly involved in immunological reactions: IRAK1 recruit IKK complex (P = 5.9 × 10 -4 ). IRAK1 (Interleukin 1 Receptor Associated Kinase 1) has been found to play a major role in initiating the innate immune response to microbial pathogens. Stimulation of IRAK1 activates the Interleukin-1R/Toll-like receptor signaling pathway through interaction with Tumor Necrosis Factor receptor-associated factor 6 (TRAF6) resulting further downstream in the activation of the NF-κB, which finally leads to the initiation of immune and inflammatory responses 25 . Importantly, the IRAK1 signaling pathway is triggered by microbial pathogens and structures, including bacterial lipopolysaccharides (LPS), components of the cell membrane in gram-negative bacteria.

Discussion
In the present study, we hypothesized that genetic factors may affect the susceptibility to bacterial infections in individuals with diabetes. We performed a GWAS in two Finnish cohorts using comprehensive register data on prescription purchases of oral, systemic antibiotics, and corrected the analysis for glycated hemoglobin (HbA 1c ) Table 3. Replication of the top loci (P < 1 × 10-5) in the ANDIS (individuals with type 1 and type 2 diabetes) and the FinnGen (individuals without diabetes) replication cohorts. www.nature.com/scientificreports/ levels, a known environmental risk factor for bacterial infections. In the meta-analysis combining the results from both GWAS cohorts, we discovered a locus on chromosome 2 with altogether 21 common variants reaching suggestive P-values and in high LD with one another. The top variant rs62192851 had a P-value of 2.23 × 10 -7 in the meta-analysis. Moreover, stratification in the FinnDiane cohort according to the genotypes of the top variant showed that homozygotic carriers of the effect allele had a 37% lower median annual antibiotic purchase rate, as compared to homozygotic carriers of the reference alleles.
Although the association signal is located in a non-coding region, eQTL-data showed an association with the Crystallin Gamma protein family as nearly all variants in the lead locus were seen to affect the expression of CRYGD. The role of the crystallin gamma protein family in immunological reactions or pathways is uncertain, as it has previously mainly been attributed to protein structures in the lens. In open chromatin conformation capture data, the variants were also seen to interact with the promotor regions of DYTN (Dystrophin) and CCNYL1B (Cyclin Y Like 1). Fine-mapping of all the suggestive loci with gene burden tests using WES analysis revealed CRYGB and RELL1 to be potentially associated with the phenotype in individuals with type 1 diabetes. Genome-wide pathway analysis using the P-values of the variants from the meta-analysis indicated that IRAK1 and its signalling pathway that upregulates inflammatory responses upon stimulation by microbial pathogens through the NF-κB activation by IKK recruitment following IL1 and TLR activation, would be a mediator of our phenotype.
As we used a GWAS approach in our analysis we were unable to pinpoint the association of our phenotype with specific genes, although, in silico as well as WES analyses were performed to further study these associations. It is noteworthy, that two additional genes of interest were within 200 kB of our lead locus, previously associated with immunological reactions and host defense mechanisms against pathogens: IDH1 and PIKFYVE. IDH1 is known to participate in epigenetic remodeling in cells with myeloid lineage 26 , while PIKFYVE (Phosphatidylinositol-3-phosphate 5-kinase) in turn has been found to affect the replication of viruses and intracellular bacteria 27,28 . Interestingly, this gene has further been found to affect GLUT4 translocation and has therefore been linked to insulin-dependent glucose transport 29 .
We found no significant associations with our phenotype in the replication analysis in the Swedish ANDIS cohort, which might suggest that our finding is unique for the Finnish population. This notion is also supported by the fact that two variants from the lead locus replicated nominally in the Finnish FinnGen cohort in individuals without diabetes. This potentially further demonstrates that the findings are not necessarily diabetesspecific but applicable to non-diabetic individuals as well. However, the possibility of a false positive finding in the present study must be acknowledged, as no variants attained GWAS-significance in the meta-analysis and only two variants from the lead locus marginally replicated in the FinnGen-cohort. Therefore, our results must be interpreted with caution, until validation studies are conducted.
Our GWAS is unique in that we relied on national register data from Finland and Sweden on antibiotic prescription purchases instead of recorded diagnoses of bacterial infections. This was feasible as the National Drug Prescription Purchase Register in Finland reliably reflects the bacterial infections treated in outpatient care, as antibiotics in Finland may not be purchased without a prescription. This method also allowed us to include large population cohorts, as register data on prescription purchases are available for research purposes and offer extensive retrospective data. Furthermore, the use of data on antibiotic purchases made it possible to investigate the general susceptibility by including all types of antibiotics, reflecting every bacterial infection treated in outpatient care.
We also note some shortcomings in our study. As the phenotype only includes outpatient prescriptions, severe infections treated in hospitals were not included. Furthermore, we have previously shown that the presence and severity of diabetic kidney disease is a strong risk factor for bacterial infections. As we did not adjust our GWAS-analysis for stage of diabetic kidney disease or kidney function, we cannot exclude the possibility that our findings are associated with diabetic kidney disease rather than with bacterial infections. However, in the FinnDiane cohort we saw how the allele carrier status of our top variant rs62192851 significantly affected the antibiotic purchase frequency, which supports the association between this locus and bacterial infection frequency. We also censored the follow-up at the onset of end-stage kidney disease, and further adjusted our analysis for glycemic control. As glycemic control strongly correlates with the severity of diabetic kidney disease, this indirectly adjusted the analysis for diabetic kidney disease as well, and further suggests that our findings and results associate to bacterial infections. On the other hand, it is possible that the genetic factors predisposing individuals with diabetes to infections act through hyperglycemia associated pathways and due to the adjustment for glycemic control, these potential findings were not investigated in the present study.
Clinicians may have a lower threshold for prescribing antibiotics to individuals with diabetes. This could affect how well our results can be extrapolated to the healthy non-diabetic population. Our data also included prophylactic antibiotic purchases, however, we argue that this still reflects an increased risk for an infection that required antibiotic treatment, which, in turn reflects the susceptibility to infections. As all subtypes of oral antibiotics were included in the study, our phenotype can be considered quite heterogenous. This may have diluted the significance and affected the statistical power, as opposed to studying specific infections, which could have resulted in a more specific phenotype. However, due to the heterogeneity, this could also potentially imply that our findings may demonstrate biological common immunity checkpoints for immunologic reactions concerning several different bacterial pathogens.
To conclude: the present study is to our knowledge the first to explore the potential genetic factors affecting infection susceptibility in individuals with diabetes. Our results indicate that common variants on chromosome 2, captured by GWAS analysis, may be associated with a reduced risk of infections in individuals with diabetes. Pathway-analysis suggested that genetic variation in the IRAK1-pathway is also involved in this infection susceptibility. More studies are needed to further elucidate the genetic factors behind the increased susceptibility to bacterial infections in individuals with diabetes.

Methods
Cohorts and data collection. For the main analyses, individuals were included from two separate Finnish cohorts: individuals with type 1 diabetes (N = 5092) from the FinnDiane study, and individuals with all types of diabetes (N = 4247) from the DIREVA study (Fig. 4). FinnDiane is an ongoing nationwide multicenter survey, founded in 1997 to elucidate genetic and environmental risk factors for diabetic complications with an emphasis on diabetic nephropathy 30 . DIREVA is an ongoing regional study, founded in 2007 to improve and personalize the treatment of diabetes and includes over 7000 individuals with diabetes from the Vaasa hospital district. In the FinnDiane cohort, type 1 diabetes was defined as onset of diabetes before 40 years of age and start of insulin treatment within one year from the onset. The FinnDiane protocol is in accordance with the Declaration of Helsinki and has been approved by the local ethics committees at each study center. The Direva Study has been approved by the ethics committees of Vaasa Hospital District and Turku University Hospital. Prior to the participation, all participants gave their written informed consent. Bacterial infections were identified using comprehensive nationwide register data. Data on antibiotic prescription purchases were collected between the 1st of January 1995 to the 31st of December 2014 from the Finnish National Drug Prescription Register. All systemic oral antibiotics were identified from the register data, using the Anatomical Therapeutic Chemical (ATC) Classification systems code J01. In Finland, oral, systemic antibiotics are not available over the counter in pharmacies and require a prescription from a physician. Consequently, antibiotic purchases listed in the prescription purchase register reflect the diagnoses of bacterial infections treated in outpatient care. After quality control filters (minor allele frequencies [MAF] ≥ 0.01 and imputation info r 2 ≥ 0.7), genotype imputation with minimac 3 software 31 and 1000 Genomes reference panel resulted in 8.4 × 10 6 and 8.6 × 10 6 SNPs in FinnDiane and DIREVA, respectively. In the GWAS analyses, estimated allele dosages were used and the analyses were performed with the RVTESTS software 32 , using a linear mixed model. To assess infection frequencies, an infection susceptibility risk score was calculated for all subjects as the logarithmically transformed mean number of antibiotic purchases per follow-up year. Due to an excess of zeroes in the data, a small constant (0.5 × minimum non-zero value) was added to the data before log e transformation. As end-stage kidney disease increases the risk of infections dramatically, follow-up years during or after which individuals were diagnosed with end-stage kidney disease were censored. Follow-up years prior to the onset of diabetes were also excluded. As poor glycemic control is a well-known environmental risk factor for infections, the mean HbA 1c for each  www.nature.com/scientificreports/ subject during the follow-up was calculated and added as a covariate in the analysis. To summarize, the following covariates were included in the GWAS-analysis: average age during follow-up, sex, mean HbA 1c during followup, age at onset of diabetes as well as genotyping batch-components and the kinship matrix. Results from the GWAS analyses in the FinnDiane and the DIREVA cohorts were inverse normalized and combined with fixed effects meta-analysis using METAL software.
Narrow sense heritability. The proportion of the phenotypic variance attributable to additive genetic factors was estimated in the FinnDiane cohort using a genetic relationship matrix (GRM) of unrelated individuals in a mixed linear model, via the restricted maximum likelihood (REML) approach. This was performed using the Genome-wide Complex Trait Analysis (GCTA) software 33 .
Replication cohorts. SNPs with a P-value < 1 × 10 -5 were included in the replication analyses performed in two separate cohorts: the Swedish ANDIS-cohort and the Finnish FinnGen-cohort. The ANDIS-cohort consisted of individuals with both type 1 and type 2 diabetes (n = 9602) and the FinnGen-cohort consisted of individuals without type 1 or type 2 diabetes and an age above 35 years (n = 159,166). Similar to the main cohorts, antibiotic purchases were identified using the ATC-code of J01. The phenotype in the FinnGen-cohort was defined as the total number of antibiotic purchases during the follow-up, with further adjustment for age, sex, genotyping batch and principal components. Details on the ANDIS and FinnGen studies have been reported previously 34,35 .
Exome sequencing analysis. Whole exome sequencing (WES) data on individuals from the FinnDiane cohort (N = 368) was used to fine-map the exon-regions of the most significant loci (P < 1 × 10 -5 ) discovered in the meta-analysis. From each locus the top variant was included in the analysis and all genes within a region of 0.5 Mb up-and downstream of these variants were identified. Gene burden tests for these genes were used for the identification of missense and loss-of-function-variants using the gene aggregate test SKAT-O. Gene burden tests were conducted with variant frequency filters of 0.05 and 0.5. WES analyses were performed with RVTESTS-software.
In silico analysis. Pathway analysis. As pathway analysis improves statistical power and facilitates significant discoveries even in small sample sets with complex phenotypes, we performed pathway scoring with the Pascal software 36 . This software utilizes P-values of the variants from the GWAS summary statistics and uses a modified Fisher method, which eliminates the need for arbitrary significance-threshold selection and yields power improvement. We included variants with a MAF > 0.01 in the analysis and genes with less than 3000 variants. Significant P-value thresholds were calculated with correction for multiple testing: P thresh = 2.9 × 10 -5 .
Chromatin conformation capture data. Variants in non-coding regions may affect the expression of genes not necessarily close in genetic proximity by physically interacting with the promoter regions of the genes due to chromatin conformation. To study these genetic interactions of our findings, we used the freely available 3D chromatin conformation capture data ChiCp (Promoter Capture Hi-C) on chicp.org 37 . A systematic search was performed on all suggestive variants (P < 1 × 10 -5 ) by separately inserting them into the ChiCp-browser. The level of significance of the interaction score in the tests was set to a score over 5. Interactions were studied in macrophage M0-M2-cells, monocytes, neutrophils and finally Naïve as well as Total B-, CD4 and CD8 cells. Only the most significant finding for each variant is reported.

Data availability
The ethics statement and the informed consent do not allow sharing of individual-level data.