Omics data integration identifies ELOVL7 and MMD gene regions as novel loci for adalimumab response in patients with Crohn’s disease

Response to anti-TNF therapy is of pivotal importance in patients with Crohn’s disease (CD). Here we integrated our and previously reported PBMC derived transcriptomic and genomic data for identification of biomarkers for discrimination between responders and non-responders to anti-TNF therapy. CD patients, who were naïve with respect to the treatment with biologicals, were enrolled in the study. DNA and RNA were extracted from peripheral blood mononuclear cells. RNA-seq was performed using BGISEQ-500. Genotyping was performed using Infinium Global Screening Array. Association regressions were carried out with 12 week response to adalimumab as an outcome variable. RNA-seq analysis confirmed 7 out of 65 previously suggested genes involved in anti-TNF response. Subsequently, analysis of single nucleotide variants in regions of confirmed genes identified 5 variants near MMD and two in ELOVL7 intronic regions associated with treatment response to anti-TNF. Functional analysis has shown that rs1465352, rs4422035 and rs78620886 are listed at H3K9ac_Pro histone modification epigenetic mark. The present study confirmed MMD and ELOVL7 involvement in anti-TNF response and revealed that the regulation of MMD and ELOVL7 gene regions in ADA response may be a part of a complex interplay extending from genetic to epigenetic and to transcriptomic level.

Inflammatory bowel disease (IBD) includes both Crohn's disease (CD) and ulcerative colitis (UC) and results from improper inflammatory response to microbiota in a host with genetic susceptibility 1 . With tumor necrosis factor alpha (TNF-α) being the key cytokine in inflammation, many anti-TNF therapies emerged and dominated in the treatment of CD 2,3 . On one hand, disease management was significantly improved with anti-TNF agents, such as adalimumab (ADA) and infliximab (IFX) 4,5 , but on the other hand, 30% of patients do not respond to the anti-TNF treatment and of those who initially benefit, up to 40% lose their response later 2,4,6,7 . Moreover, unnecessary continuation of the anti-TNF therapy may severely impede patients' quality of life and impose adverse effect risk without clinical justification with emerging of an additional challenge of loss of response management 8,9 . Thus, identification of anti-TNF therapy response predictors is of pivotal importance and is posing an additionally challenge in the last decade. Prediction of response is also needed as we now have other therapies with different targets available 10 . Several studies have focused on genetic background and gene expression [11][12][13][14][15][16][17] as potential biomarkers for anti-TNF treatment response. Currently the genomics markers of anti-TNF response in CD are not reaching significant thresholds and practically no reproducibility between genetic and expression markers exists, as identified by the first integration of genomic and expression data prior to anti-TNF treatment in a systematic review 18 . Prediction profiles have been mostly identified using epithelial gene expression data for which colonoscopy and biopsy sampling are needed 16,17 . However, obtaining colon biopsies is a highly invasive procedure. Studies which would focus on non-invasive genetic expression biomarkers, such as gene Results RNA-seq analysis. Using our RNA sequencing data we performed an expression study using previously suggested candidate genes discriminating anti-TNF response in CD patients. For that, we used two different approaches. In first approach, the analysis was adjusted to deconvoluted fractions of PBMCs and has identified 11 differentially expressed genes out of which MMD, ELVOL7 and BMP6 remained significantly down-regulated in responders relative to non-responders after correction (Table 2, Fig. 1). In the second approach, analysis was carried out unadjusted in order to obtain PBMC panel. The analysis has confirmed 10 differentially expressed genes out of which 4 (GPR84, EPSTI1, IFI6, MX1) remained significantly up-regulated after correction for multiple comparison in responders relative to non-responders (Table 2, Fig. 1).

Integration to genomics.
Using an integrative transcriptomic-genomic approach we analyzed single nucleotide variants ranging ± 100 kb from previously identified differentially expressed genes in 84 patients with CD. Association analysis has identified 5 statistically significant variants near gene MMD ( Fig. 2A) and 2 statistically significant variants in ELOVL7 gene region (Fig. 2B). For single nucleotide polymorphisms (SNPs) rs1465352, rs4422035, rs9892429, rs9893820 and rs11656799 near gene MMD the results have shown significant higher alternative allele frequency in non-responders in comparison to responders (Table 3). For the most significant rs4422035 the analysis has shown higher alternative allele frequencies in non-responders (Allele C: 0.69) in comparison to responders (Allele C: 0.45) (p = 0.018; OR 0.22; CI 0.09-0.55). Additional LD assessment has shown that that rs4422035 and rs1465352 are in linkage disequilibrium (D′: 1.0; r 2 : 0.87; p < 10 -4 ). Identified rs9892429, rs9893820 and rs11656799 have shown higher linkage disequilibrium with rs1465352 (D′: 1.0; r 2 : 0.96; p < 10 -4 ) as in comparison to most significant rs4422035, which is also evident from the Table 3. Similar effect was observed also for variants identified in ELOVL7 gene region. Both identified variants have shown the same trend of alternative allele frequencies (Table 3). For the most significant rs78620886 the analysis has shown higher alternative allele frequencies in non-responders (Allele A: 0.26) in comparison to responders (Allele A: 0.03) (p = 0.021; OR 0.05; CI 0.01-0.27). Additional LD analysis has also shown that rs78620886 is in linkage disequilibrium with rs9291695 (D′: 1.0; r 2 : 0.95; p < 10 -4 ). Table 1. Previously identified anti-TNF response genes in PBMC and integrated genomic markers. CD Crohn's disease, RA rheumatoid arthritis, IBD inflammatory bowel disease, IBDu inflammatory bowel disease unclassified, UC ulcerative colitis.

Mendelian randomization analysis.
An additional Inverse-variance random-effect Mendelian Randomization (MR) analysis was performed. First, variants ranging ± 100 kb from MMD and ELOVL7 were included into conditional regressions in order to detect independent signals conditioned to the identified most significant SNPs. Conditioning results have identified statistically significant rs12943795 besides rs4422035 for MMD, and rs182724486 and rs149880548 besides rs78620886 for ELOVL7. Statistically significant conditioned independent variants were further assessed for eQTLs in corresponding tissues/cells in which chromatin state assignment was listed in HaploReg and were included into subsequent MR analysis. MR analysis has proven the association of ELOVL7 with response to adalimumab (p = 0.046; SE 2.812; CI95 11.13 to − 0.11). For MMD no MR analysis was performed due to unavailable eQTLs for independent rs12943795 and rs4422035 SNPs in corresponding tissues/cells.
Validation of genes using RT-qPCR. In order to assess and validate the findings of RNA-seq analysis and subsequent genomic variants, RT-qPCR was performed. First, reference genes were checked for stability between responders and non-responders. Both genes, ACTB (p = 0.768) and B2M (p = 0.782) proved to be stably expressed in both groups. Subsequently gene expression of MMD and ELOVL7 was measured in PBMCs. Both genes were down-regulated in responders, but statistical significance was observed only at ELOVL7, which was 1.43-fold down-regulated in responders relative to non-responders (p = 0.016; β − 0.535; SE 0.214) (Fig. 4B). MMD was 1.10-fold down-regulated in responders relative to non-responders (p = 0.498; β − 0.141; SE 0.656) (Fig. 4A). Additionally eQTLs in PBMCs were assessed for SNPs where chromatin state assignment was listed.

Discussion
Anti-TNF therapy is of pivotal importance in patients with CD and genomic biomarkers that would tailor personalized treatment are currently needed, in particular as new biologicals for CD treatment have recently emerged. In the recent systematic review of anti-TNF genomic biomarkers in CD, it was noted that almost no candidate gene could be confirmed in an independent association study and no overlap was detected among genes identified with association and gene expression studies 18 . In the present study, we have adopted a new approach 19 for the integration of PBMCs derived transcriptomic and genomic data. To our knowledge, this is the first time deconvolution was applied in order to account for PBMC composition in anti-TNF response transcriptomics, which identified two novel genetic loci on chromosomes 17 near gene MMD and 5 in gene ELOVL7, associated with anti-TNF response in CD patients. For that, we included patients with CD, who were naïve with respect to the treatment with biologicals and who were subsequently treated with ADA and investigated PBMC derived transcriptomic-genomic data integration with targeting on previously identified gene expression panels for differentiation of responders from non-responders to anti-TNF therapy. Additionally, we investigated independently confirmed and previously integrated genomic and expression (RNA and protein) markers of anti-TNF therapy response in CD patients. Thus, the targeted expression panel consisted altogether of 65 genes, which were previously identified in 2126 First, using RNA-seq and using 3 responders and 3 non-responders to ADA we confirmed differentially expressed genes from previously identified gene panel. The model fitting for estimation of differentially expressed genes was carried out in two different ways-adjusted to deconvolution results and not adjusted in order to obtain PBMC expression picture. The results have shown that in deconvoluted analysis MMD, ELOVL7 and BMP6 remained significantly differentially expressed after correction for multiple comparisons. All three genes were down-regulated in responders relative to non-responders (up-regulated in non-responders relative to responders). Using unadjusted model fitting we observed the confirmation of GPR84, EPSTI1, IFI6, MX1, which remained significantly differentially expressed after the application of multiple comparison correction. All four were found to be up-regulated in responders relative to non-responders (down-regulated in non-responders relative to responders). Subsequently, confirmed genes were further analyzed using integration with genomic data obtained from 84 CD patients, who were naïve with respect to the treatment with biologicals, using a genotype microarray and using an association analysis with 12 week response as an outcome variable. Imputed high quality single nucleotide variants were analyzed using a margin of ± 100 kb from confirmed genes in RNA-seq analysis in order to include variants in regulatory gene regions. The results have identified two significant signals, one near gene MMD and one in ELOVL7 gene region after application of Bonferroni-like correction using empirical autocorrelation estimation, but no statistically significant signals were observed in other 5 genes. Both MMD and ELOVL7 were identified in deconvoluted approach. Thus, obtained results are pointing to increased robustness of analysis when using deconvolution of PBMCs. As previously shown using deconvolution-meta-analysis paradigm, adjustment to estimation of cell proportions can capture differential expression otherwise masked due to cellular composition variation 22 . Expression of MMD (Monocyte to macrophage differentiation-associated) gene can be detected in all tissues and its' up-regulation is observed upon monocyte differentiation, which modulates the production of TNF-α and nitrous oxide (NO) 25 . TNF was previously identified among key players of the Schmitt et al. proposed model of anti-TNF non-response 26 and was identified as a sole key player in the baseline response prediction screening 18 . It was also previously suggested that there is an association between the loss of anti-TNF response and ROS response 27 , which is in concordance with our findings as up-regulation of MMD gene modulates also the production of NO. ELOVL7 (ELOVL Fatty acid elongase 7) gene is a member of 7 ELOVL isozymes which catalyze the first step of very long-chain fatty acid elongation cycle 28 . In the context of IBD, ELOVL7 was previously associated only with protective association for hidradenitis suppurativa 29 . In aforementioned regions, the analysis identified SNPs rs1465352, rs4422035, rs9892429, rs9893820 and rs11656766 located downstream of MMD and SNPs rs9291695 and rs78620886 located in ELOVL7 intronic regions. OR calculated for alternative allele for all SNPs indicates higher probability for non-responding to ADA therapy. To the best of our knowledge this is the first time that the aforementioned SNPs are associated with response to anti-TNF therapy in CD. Furthermore, functional analysis for MMD rs1465352 and rs4422035, and ELOVL7 rs78620886 has shown that these SNPs are listed in HaploReg v4.1 at H3K9ac_Pro histone modification epigenetic mark in monocytes and lymphocytes, respectively. Histone modifications are implicated in influencing gene expression by establishing global chromatin environments and H3K9ac (Histone 3 Lysine 9 acetylation) histone modification is enriched in promoter regions containing critical regulatory elements necessary for transcription 30 . H3K9 is a residue that can be acetylated or methylated and H3K9ac was shown to be dramatically reduced during chromosome condensation 31 . Moreover, H3K9ac is associated with active promoters and is considered as a hallmark of active transcriptions 32 . As MMD and ELOVL7 were both up-regulated in non-responders to ADA, we hypothesize that locations of rs1465352, rs4422035 and rs78620886 might play a significant role in histone modifications in terms of H3K9 residue acetylation and up-regulation of gene expression. Using ENSEMBL 33 and GTExportal database 34 statistically significant eQTLs with MMD and ELOVL7 were also listed. Additionally, the implication of aforementioned SNPs in ADA response was also confirmed using random forest machine learning algorithm and ROC analysis, which has additionally confirmed the association of these three SNPs with ADA response in patients with CD. In order to fully address the implication of both genes in ADA response, a Mendelian randomization integrating eQTL and association analysis data was performed. Gene ELOVL7 was subsequently again confirmed as a significant candidate, however no analysis was performed for MMD due to unavailability of eQTL data for selected variants in corresponding tissues/cells in used databases 33,34 . Furthermore, monocyte to macrophage differentiation was found to go along with profound changes in lipid-related transcriptome, leading to an induction of fatty-acid elongation 35 , which can elucidate the interplay of MMD and ELOVL7 up-regulation in ADA non-responders. These findings are further supported by previously shown genetically influenced abnormalities in fatty acid profiles in IBD 36 . The main limitation of the present study is the limited sample size in transcriptome analysis. Nevertheless, we believe that using trimmed mean of M values method (TMM) normalization implemented in edgeR and the variance modeling at the observational level (VOOM) method implemented in the limma package before applying empirical Bayes over a linear model, made our analysis stringent enough to provide reliable results. The aforementioned concern was also addressed using RT-qPCR analysis for estimation of MMD and ELOVL7 gene expression in PBMCs in a larger cohort in order to validate the findings. However, MMD and ELOVL7 were identified using an approach with deconvolution of PBMCs, thus differences between responders and non-responders may not have been fully revealed using RT-qPCR. Obtained results have shown that both, MMD and ELOVL7 tend to be up-regulated in non-responders relative to responders, but only ELOVL7 reached statistical significance, which is consistent with RNA-seq findings. Furthermore, for SNPs where chromatin state assignment was listed, no trend of association with MMD and ELOVL7 was observed in PBMCs, which is also consistent with reported eQTLs in whole blood. Moreover, using RT-qPCR we observed a strong correlation between MMD and ELOVL7, which further supports the interplay between these two genes and serves as additional supporting evidence of monocyte to macrophage differentiation and changes in lipid-related transcriptome. In addition, based on our results functional studies www.nature.com/scientificreports/ are warranted and could further unravel the interplay between MMD, ELOVL7 and H3K9ac in response to anti-TNF therapy. We also acknowledge that the homogenousgroup of enrolled IBD classified CD patients and using only one anti-TNF agent represents the strength of our study design.
In summary, the present study integrated transcriptomic and genomic data obtained from CD patients' PBMCs and identified two novel genetic loci, on chromosomes 17 and 5, associated with anti TNF response in CD patients and suggested MMD and ELOVL7 as best candidates for involvement in anti-TNF response. Moreover, the present study revealed that the implication and regulation of MMD and ELOVL7 gene regions in ADA response may be a part of a complex interplay extending from genetic to epigenetic and to transcriptomic level. This knowledge could be further translated into new clinical non-invasive baseline biomarkers for adalimumab response in patients with CD.

Methods
All methods were carried out in accordance with relevant guidelines and regulations.
Enrolled subjects. We included 84 Slovenian patients with CD who fulfilled the criteria for anti-TNF-α drug ADA (Humira, Abbott Laboratories, IL, USA), and who were naïve with respect to the treatment with biologicals. All enrolled subjects were of Slovenian (Caucasian-Central Europe) ethnicity. Baseline demographics of the patients are shown in Table 5. Inclusion criteria were refractoriness to corticosteroids, adverse effects to corticosteroids and lost response or intolerance to IFX as described previously 12 . Exclusion criteria were defined as presence of CD complications as stenosis, abscesses, total colectomy, history of murine proteins allergy, active tuberculosis or a serious infection in the last 3 months, pregnancy or lactation and malignancy 37 . First, a loading dose of 160 mg of ADA was administered, followed by 80 mg after 14 days and maintenance dose of 40 mg every other week. Azathioprine, 5-aminosalycylates, corticosteroids or antibiotic concomitant treatment was allowed if the dosage was stable in the last 3 months. The present study was evaluated and approved by Slovenian National Committee for Medical Ethics (KME 80/10/07, 21p/12/07). Written informed consent was obtained from all enrolled subjects. In case of minors, written informed consent was obtained from parents and/or legal guardians. Response to treatment was measured using IBD questionnaire (IBDQ) after 12 weeks of treatment. Response was defined as an increase in score > 22 points in comparison of pre-treatment score or total score > 170 points 38,39 . Extraction of nucleic acids. DNA and RNA were extracted from peripheral blood mononuclear cells (PBMC) using TRI-reagent (Merck, Darmstadt, Germany) according to manufacturer's instructions. Purity and concentration of nucleic acids was determined using Synergy 2 spectrophotometer (Biotek, Winooski, VT, USA). Integrity of RNA was checked using agarose gel electrophoresis and 2100 Bioanalyzer Instrument (Agilent, Santa Clara, CA, USA) with RNA 6000 Nanochip.

RNA-seq analysis.
For RNA-seq analysis, 3 responders and 3 non-responders were selected. These patients were selected for RNA-seq based on available data for continuously measured treatment response, which was recorded at weeks 4, 12, 20 and 30. The selected 6 patients qualified as 3 were consistent responders and 3 consistent non-responders during all check point measurements. RNA RIN number was > 9.0 and 28S/18S ratio between 1.5 and 2.0 for all 6 samples. Both, lncRNA and mRNA 100 bp paired-end libraries were constructed using MGIEasy rRNA Depletion Kit (MGITech, Shenzen, China) and MGIEasy RNA Library Prep Set (MGITech) and using the BGISEQ-500 instrument (BGI, Hong Kong) at BGI facilities (BGI). Data analysis was performed using R 4.0.2 environment (R Core Team 2020, Vienna, Austria). Raw .fastq files were first assessed for quality using FastQC 0.11.9 software 40 . Technical sequences and sequencing adapters were trimmed using Trimmomatic 0.39 tool 41 . Paired-end reads were aligned to the hg19 reference genome using Rsubread 2.2.4 R package 42,43 . Mapped reads were counted and assigned to genomic features using featureCounts 44 with requirement of both ends to be mapped. Counts per million (CPMs) were calculated using edgeR 3.30.3 R package 45 . Low expressed genes were filtered out based on CPMs corresponding to read counts of 10 and retained genes were normalized using the trimmed mean of M values method (TMM) 46 . Subsequently, mean-variance modeling at the observational level transformation (VOOM) was applied 47 . Differential expression of responders rela- www.nature.com/scientificreports/ tive to non-responders was determined using linear models and empirical bayes implemented in limma 3.44.3 R package 48 using two approaches. In first approach, the linear model was adjusted based on the deconvoluted peripheral blood mononuclear cell composition data. Deconvolution was used to obtain an estimation of the abundances of member cell types in a mixed cell population, using gene expression data in terms of proportions of different white cell subtypes in PBMCs 49 . Using raw counts, transcripts per million (TPM) were calculated, zero values were removed and data was deconvoluted using CIBERSORT 49 and LM22 49 signature matrix. Sample composition in terms of proportions of lymphocytes, monocytes/macrophages and neutrophils was used as a covariate in the model. Zero estimated cell proportions were excluded from analyses. In the second approach, the linear model was left unadjusted in order to obtain PBMC gene profile picture. Differential expression was considered for genes with adjusted p-value < 0.05.

Association analysis. Samples obtained from 84 enrolled individuals were genotyped using Infinium
Global Screening Array (GSA_24v1) (Illumina, San Diego, California, USA). Quality control of the genotyped data was performed as described previously 50 . Further steps of analysis were based on an integrative transcriptomic-genomic approach 19 . Subsequently, imputation was performed using Michigan Imputation Server Mini-mac3 genotype imputation algorithm and using HRC r1.1 2016 reference panel and SHAPEIT v2.r790 phasing 51 .
Association between responders at week 12 vs. non-responders at week 12 was tested using binary logistic Wald test implemented in EPACTS 3.2.6 52 . Association analysis was adjusted to age at diagnosis, sex, azathioprine use, use of aminosalycylates, use of corticosteroids and first four principal components, which were calculated using PLINK 1.9 software 53,54 . Appropriate number of principal components was determined beforehand using gap v1.2.2 R package. Retained were only variants with imputation score (Rsq) > 0.3. Variants ± 100 kb from previously identified differentially expressed genes were further analyzed. Bonferroni-like correction was applied based on empirical autocorrelation determined using coda R package 55 . Statistically significant signal was considered for SNPs with adjusted p-value < 0.05. Linkage disequilibrium was calculated using LDlink software 56 .
In silico functional analysis, prediction value estimation and visualization. Functional analyses were done using HaploReg v4.1 57 , GTExPortal 34 and ENSEMBL 33 . Regional Manhattan plots were constructed using LocusZoom 24 . At SNPs, where chromatin state assignment was listed, predictive value of genotypes was further assessed using randomForest 4.6-14 R package 58 and Receiver Operating Characteristics analysis using pROC R package 59 . All other plots were constructed using ggplot2 R package 23 .
Mendelian randomization analysis. Data was further analyzed using Mendelian randomization integration of eQTL and association analysis data. Previously retained variants ± 100 kb from MMD and ELOVL7 gene regions were further analyzed using GCTA-COJO conditional analysis 60 . GCTA-COJO conditional analysis was performed conditional on most significant SNP identified in previous association analysis. Reference dataset for GCTA-COJO was built based on previously imputed data. Variants with Rsq < 0.3 were removed from the reference dataset prior to analysis. Condition SNP and SNPs, which remained significant after conditioning, were included in subsequent Mendelian randomization analysis. Mendelian randomization was performed using MendelianRandomization v0.5.0 R package and inverse-variance weighted random-effect method 61 . Variant-gene association (eQTL) obtained from GTExportal 34 and ENSEMBL 33 in corresponding tissues/cells was included as exposure and variant-outcome association as outcome for MR analysis. If no eQTL data was available in corresponding tissues/cells, the variant was excluded from MR analysis.
Validation of genes using RT-qPCR. A total of 1 µg of mRNA with RIN > 8 was obtained from 55 (21 non-responders and 34 responders) out of 84 enrolled individuals and was transcribed into cDNA using high capacity cDNA reverse transcription kit (Thermo Fisher, Waltham, MA, USA). mRNA nucleotide sequences of target genes MMD and ELOVL7 were retrieved from NCBI Nucleotide database (www.ncbi.nlm.nih.gov/nucco re/). Primers for reference genes ACTB and B2M were obtained from previous study 17 . Isoform non-specific primers were hand-picked and designed using IDT OligoAnalyzer Tool (eu.idtdna.com/calc/analyzer). Primer sequences and accession numbers are summarized in Table 6. All primers were synthesized by Sigma (Merck, Darmstadt, Germany). Reverse transcription quantitative polymerase chain reaction (RT-qPCR) gene expression assay was carried out using Lightcycler 480 SYBR Green I Master Mix and Lightcycler 480 real time thermocycler (Roche, Basel, Switzerland) according to manufacturer's instructions. 2 µL of 20-fold diluted cDNA (2.5 ng/µL) was used as a template. Primer efficiency was > 90% for all primer pairs. Melting curves for each Table 6. Primer sequences and accession numbers of target and reference genes. www.nature.com/scientificreports/ sample were analyzed after each run in order to confirm specificity of amplification. Raw Ct values were obtained from three run-independent technical replicates for each sample. Normalization of raw data was performed using geometric averaging of reference genes and relative expression was calculated using 2 −ΔΔCt method 62 . Statistics was performed using linear 2 −ΔCt calculation and linear regression was adjusted to same covariates as aforementioned in association analysis. Stability of reference genes was statistically assessed using relative 2 −ΔCt calculation and Mann-Whitney U-Test. eQTLs based on the study data were estimated for SNPs where chromatin state assignment was listed. Estimation was performed using linear regression. Violation of linear regression assumption was considered if variance inflation factor > 10 and condition index > 30. Correlation between gene expressions was assessed using Spearman's rank correlation. License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.