Genetic regulation of gene expression of MIF family members in lung tissue

Macrophage migration inhibitory factor (MIF) is a cytokine found to be associated with chronic obstructive pulmonary disease (COPD). However, there is no consensus on how MIF levels differ in COPD compared to control conditions and there are no reports on MIF expression in lung tissue. Here we studied gene expression of members of the MIF family MIF, D-Dopachrome Tautomerase (DDT) and DDT-like (DDTL) in a lung tissue dataset with 1087 subjects and identified single nucleotide polymorphisms (SNPs) regulating their gene expression. We found higher MIF and DDT expression in COPD patients compared to non-COPD subjects and found 71 SNPs significantly influencing gene expression of MIF and DDTL. Furthermore, the platform used to measure MIF (microarray or RNAseq) was found to influence the splice variants detected and subsequently the direction of the SNP effects on MIF expression. Among the SNPs found to regulate MIF expression, the major LD block identified was linked to rs5844572, a SNP previously found to be associated with lower diffusion capacity in COPD. This suggests that MIF may be contributing to the pathogenesis of COPD, as SNPs that influence MIF expression are also associated with symptoms of COPD. Our study shows that MIF levels are affected not only by disease but also by genetic diversity (i.e. SNPs). Since none of our significant eSNPs for MIF or DDTL have been described in GWAS for COPD or lung function, MIF expression in COPD patients is more likely a consequence of disease-related factors rather than a cause of the disease.


Scientific Reports
| (2020) 10:16980 | https://doi.org/10.1038/s41598-020-74121-w www.nature.com/scientificreports/ sputum, and in macrophages present in bronchoalveolar lavage of COPD patients compared to healthy smokers and (non-smoker) controls 10,11 . However, other studies showed lower levels of MIF in serum of COPD patients (GOLD stages II-IV) compared to controls 12 , and also lower plasma MIF levels in COPD patients compared to healthy smokers 13 . Interestingly, MIF expression levels were shown to be influenced by the MIF-794 CATT [5][6][7][8] microsatellite (rs5844572), in which 5, 6, 7, or 8 repeats of the CATT sequence could be found and the 5-CATT repeat (CATT 5 allele) leads to the lowest level of MIF expression under basal or stimulated conditions 14 . Additionally, it was reported that the MIF-794 CATT 5 allele was associated with a lower diffusion capacity in COPD patients 15 . Therefore, genetic variation may explain some of the differences found for MIF expression in COPD and may also influence disease severity as defined by the level of diffusion capacity. D-Dopachrome tautomerase (DDT, also known as MIF-2) is another member of the MIF protein superfamily that has been suggested to play similar roles to those of MIF 16 . However, studies linking DDT to COPD are lacking. Furthermore, the human genome also encodes a gene known as DDT-like (DDTL), which according to genomic records appears to be primarily present in primates 17 . DDTL shows high sequence similarity with DDT (approximately 80%), and is located in close proximity to DDT and MIF. To date, nothing is known about the biological function of DDTL or its expression in lung tissue.
Due to the association of MIF with numerous chronic inflammatory diseases, the scientific community currently has an interest in generating MIF inhibitors to fight chronic diseases 18 . For COPD however, given the inconsistent reports, it is not clear yet whether inhibiting or mimicking MIF would be beneficial. Therefore, it is important to attain more clarity on the levels and regulation of MIF and other MIF family members in lung tissue in COPD.
Here, we aimed to investigate the gene expression levels of the MIF family members MIF, DDT and DDTL in lung tissue of patients with and without COPD and to assess whether their gene expression is regulated by single nucleotide polymorphisms (SNPs).

Results
Gene expression of MIF family members in lung samples. We first compared gene expression levels of MIF, DDT and DDTL in lung tissue samples from a subset of subjects from the lung tissue dataset, with (n = 276) and without COPD (n = 236). An overview of the lung tissue dataset and methods used in our study are shown in Fig. 1. Clinical characteristics of the subjects included for the gene expression analysis are presented in Table 1. Gene expression was adjusted for gender, age and smoking status. We found significantly higher expression of MIF (p-value = 0.0017, Fig. 2a) and DDT (p-value = 0.0001, Fig. 2b) in subjects with COPD compared to those not having COPD, and no significant differences for DDTL expression (Fig. 2c). The higher expression of MIF and DDT in COPD was driven by patients with COPD GOLD stage 4 for MIF and COPD GOLD stage 2 and 4 for DDT (Fig. S1). In contrast, we did not find biologically meaningful correlations with either FEV 1 or FEV 1 /FVC (no r values above 0.1) when either analyzing non-COPD and COPD patients together or separately. eQTL analysis of the lung tissue dataset. We then investigated whether MIF, DDT and DDTL expression levels were influenced by the presence of SNPs. To this end, we performed a cis-eQTL analysis 19 for these three genes using the entire lung tissue dataset (n = 1087), which contains mostly patients with COPD with or without lung cancer, lung cancer patients with normal lung function (Non-COPD controls), and a few patients with a variety of interstitial lung diseases. A schematic representation of the step-by-step approach for the eQTL, subsequent analyses and the main results are shown in Fig. 3a. We only included subjects with both gene expression and genotype data available from the groups described previously 20 .
We found 206 significant SNPs regulating gene expression (eSNPs) in lung tissue, 72 eSNPs for MIF and 134 eSNPs for DDTL, with p-values as low as 4.09E −31 and 1E −314 , respectively. No significant eSNPs were found for DDT. Subsequently, in order to clear the redundant SNPs that are inherited as a cluster, we organized the significant eSNPs in linkage disequilibrium (LD) blocks. We tested this with an LD analysis and found 11 LD blocks and 19 additional independent signals among the 72 eSNPs of MIF and 19 LD blocks and 25 independent signals among the 134 eSNPs of DDTL. From the SNPs regulating MIF, 71 out of 72 were also regulating DDTL (11 LD blocks and 18 single signals). All significant SNPs found in our study are shown in Table 2. One of the strongest eSNP blocks for both MIF and DDTL was represented by rs5751777, which significantly influences expression levels of both MIF and DDTL as seen in Fig. 3b. We replicated these significant eSNPs using the Genotype-Tissue Expression (GTEx) project 21 , a comprehensive public resource to study tissue-specific gene expression. For this analysis we included only the significant GTEx eQTL data from lung tissue (n = 383). We confirmed 34 of 134 eSNPs for DDTL with the same direction of effect (3 LD blocks and 8 independent signals). For MIF no eSNPs were confirmed in GTEx with the same direction of effect, whereas we did identify 57 significant MIF eSNPs in GTEx with opposite effects (10 LD blocks and 12 independent signals).

MIF splice variants and spliceQTL analysis.
A puzzling finding was the difference in effect direction between our study and the GTEx study for the 57 significant MIF eSNPs. Considering that our study used microarray and the GTEx study used RNA-Seq to assess gene expression, we investigated the binding site of the microarray MIF probe set used in our study. MIF has three known splice variants: one protein-coding splice variant which contains 3 exons, and two non-coding splice variants that either retain an intron between exon 1 and 2 or between exon 2 and 3 (Fig. 4a). Interestingly, the MIF probe set in our assay targeted the splicing junction Exon2-Exon3 (Fig. 4b), thus detecting only splice variants 1 and 3. Since RNA-Seq would include all splice variant, the opposite eQTL effects may be due to selective expression of MIF splice variants.
To first confirm the direction of the eQTL effect found in our study with the Exon2-Exon3 junction, we used a previously reported dataset of airway wall biopsies in which the exon-exon reads data were present 22 Fig. 5. We found that in airway wall biopsies, rs5751777 significantly regulated MIF split read between Exon 2-Exon 3 with the CC genotype leading to higher expression (Fig. 6a), which is the same direction as in our lung tissue dataset. Additionally, rs5751777 influenced the MIF split read between Exon1-Exon 2 in the same direction. Thus indicating that the protein-coding MIF splice variant, which contains both exon-exon junctions, is increased by the CC genotype of rs5751777, while the other non-coding splice variants are influenced in the www.nature.com/scientificreports/ opposite direction. To test this hypothesis we next assessed the eQTL effect of rs5751777 on total MIF expression (which includes the protein-coding splice variant and both non-coding splice variants of MIF) on the same airway wall biopsies samples. We found indeed a lower expression in the CC genotype for all splice variants together, with the same direction as reported in GTEx (Fig. 6b).

SNPs in genome-wide association studies (GWAS).
Lastly, we evaluated whether the eSNPs found in our study were reported by the GWAS for COPD or FEV 1 listed in Table 3. None of our significant eSNPs for MIF and DDTL (or their LD partners) have been described in GWAS for COPD or lung function.

Discussion
The primary objective of this study was to evaluate gene expression of the MIF family members MIF, DDT and DDTL in lung tissue of COPD patients compared to non-COPD subjects and to elucidate whether MIF expression in lung tissue is regulated genetically by SNPs. We found higher gene expression levels of MIF and DDT in lung tissue samples of COPD patients, compared to non-COPD subjects. While previous studies have not assessed MIF levels in lung tissue, higher levels of MIF have also been reported in serum, sputum and macrophages of bronchoalveolar lavage of COPD patients 10,11 . However, other studies have also detected lower levels of MIF in serum and plasma of COPD patients compared to controls 12,13 , which could be due to patient selection or the different nature and origin of the samples these studies used. MIF has been described as a proinflammatory cytokine, but there is also evidence that MIF can exert many other functions related to cell survival and anti-apoptosis 12,23,24 . Interestingly, MIF may actually be involved in tissue repair more than in promoting inflammation, but its role varies across lung diseases as discussed in detail in our recent review 9 . In fact, it was previously reported that MIF-deficient mice spontaneously develop age-related emphysema 13 , which suggests that in lung tissue MIF may  www.nature.com/scientificreports/ protect against tissue destruction. The higher MIF expression we found in COPD could therefore be a result of activation of protective mechanisms triggered to combat tissue destruction in COPD development. Due to the various shared biological functions between MIF and DDT, and the observed higher MIF and DDT expression in COPD, it seems likely that DTT may also play a protective role in the lung. However, this is still a hypothesis and requires further testing. We did not find differences in expression between COPD and non-COPD for DDTL, but our eQTL results suggest that the regulation of DDTL expression resembled that of MIF. Considering that MIF and DDTL are located in close proximity (less than 0.1 Mb between the two genes), it is not surprising that our cis-eQTL analysis, which identifies SNPs within 1 Mb from the binding site of the probe, found SNPs regulating the expression of both genes, indicating possible dual regulation. Surprisingly, we did not find significant eSNPs for DDT, which is also in the vicinity of MIF and DDTL. This suggests that there are different regulatory mechanisms between MIF and DDT and that DDT expression is not regulated by the genetic mechanisms included in our study.
To validate our findings, we replicated our eQTL results with the publicly available GTEx dataset. While we confirmed a group of eSNPs for DDTL, to our surprise we found a group of significant eSNPs for MIF with opposite effects in our study and in the GTEx dataset. The most likely explanation for these opposite effects of MIF eSNPs is that different splice variants were present in the quantitation of gene expression in our study and in the GTEx. The MIF probe set in the gene expression array of our lung tissue dataset binds across an exon-exon   Fig. 4b) and thus can only detect splice variants 1 and 3 and not splice variant 2 (Fig. 4a). The GTEx study assessed gene expression by RNAseq but did not perform a spliceQTL analysis to study the effect of SNPs on each of the splice variants. Focusing on one of the most significant SNP blocks for MIF and DDTL, rs5751777, and using the airway wall biopsy dataset we were able to assess the effect of this SNP on the expression of all splice variants combined or just variants for each exon-exon junction. We confirmed that the direction  www.nature.com/scientificreports/ of the eQTL effect for the sets of MIF splice-variants is opposite to that of all MIF splice variants combined and that the effect of the total MIF matches the direction of the effect found by GTEx. Even though we could not investigate this splice variant effect in our lung tissue dataset directly, due to the lack of splice variant data, these results do show that the effect of rs5751777 on MIF expression has the same direction in the airway wall dataset and in the lung tissue dataset, suggesting the difference with GTEx is indeed cause by splice variants. While this was only tested for rs5751777, this effect also affected all other MIF eSNPs described in Table 2. This highlights the complexity of genetic studies and the importance of measuring only protein-coding splice variants for more relevant interpretation of expression data. Nonetheless, it is unknown whether the non-coding splice variants of MIF have a different yet relevant molecular function, which may be a point of interest for future studies on MIF.
We found higher MIF expression in COPD patients and also found that MIF expression can be regulated by SNPs, we therefore investigated whether those SNPs or their LD partners are genetically predisposing individuals for the development of COPD. To that end we evaluated whether any of the significant eSNPs found in our study were reported in GWAS for COPD or lung function, available in the GWAS catalog from the European Bioinformatics Institute. None of our significant eSNPs for MIF or DDTL have been described in GWAS for COPD or lung function, suggesting that these SNPs are not predominant in COPD patients and most likely do not confer susceptibility for the development of COPD. It is therefore unlikely that the differences in MIF mRNA expression are due to a predominant presence of these SNPs in COPD patients but could be due to a combination of the SNP and environment. We hypothesize that the differential expression in MIF and DDT between COPD and control patients may be the result of epigenetic regulation, likely caused by multiple factors, but this theory requires further testing.
We also found that the higher MIF and DDT expression in COPD patients was driven by GOLD stage 4 for MIF and stages 2 and 4 for DDT, but no correlation with FEV 1 of FEV 1 /FVC. The fact that MIF knock out mice develop emphysema suggests that the role of MIF may indeed be more related to parenchymal lung tissue than airways 13 . This peripheral role is also suggested by the finding that rs5844572, the MIF-794 CATT 5-8 microsatellite, was associated with low diffusion capacity and incidentally is in high LD with the LD block rs5751759 we identified for MIF and DDTL. Unfortunately, no data on diffusion capacity were available in our datasets to confirm said association. Nonetheless, this suggests that the eSNPs located in this LD block not only regulate MIF and DDTL gene expression but are also linked to a low diffusion capacity, which is associated with disease severity in COPD patients.
In summary, we have shown that COPD patients have higher mRNA expression levels of MIF and DDT and similar DDTL expression in lung tissue, compared to non-COPD subjects. In addition, we have shown that expression of MIF and DDTL in lung tissue is at least partially controlled genetically and some of these eSNPs are shared between these two genes. This is interesting because little is known about the biological function of DDTL and this provides a basis for understanding the regulation of DDTL expression. Moreover, we found that eSNPs for MIF, as demonstrated for rs5751777, can have a significant effect on gene expression but that the direction of such effect is influenced by the MIF splice variants included in the analysis.
Due to the lack of protein data in our datasets we do not know whether higher MIF and DDT gene expression translates into higher protein levels. Moreover, we did not have epigenetic data in our datasets, which would have allowed us to look into possible epigenetic mechanisms differentially regulating MIF and DDT gene expression between COPD and non-COPD patients. Therefore, there is need for further investigation to identify the main cause and reason for the higher expression of MIF and DDT in lung tissue of COPD patients. Given the complexity of COPD pathogenesis and the fact that MIF has been shown to be involved in diverse cellular processes, we believe that the high levels of MIF in lung tissue of COPD patients could be the consequence of multiple factors associated with this disease (e.g. injury, oxidative stress, cellular senescence). Based on our current knowledge In conclusion, MIF expression is not only influenced by the presence of disease (COPD) but also by the eSNPs we identified here. While these SNPs do not appear to be the cause of the gene expression differences observed in our cohort, our data suggest that genetic diversity (i.e. SNPs) could contribute to the discrepancies in the MIF levels reported in COPD studies. Future studies aiming to assess MIF levels and their association with diseases should take into consideration the SNPs reported in our study, as they can have an additional effect on the gene expression levels already influenced by pathological processes. Moreover, it is important to consider that the direction of the effect of SNPs on MIF expression is influenced by the MIF splice variants detected and care should be taken to distinguish between protein-coding and non-coding variants.

Materials and methods
Lung tissue dataset. We used the lung tissue dataset from the Universities of Groningen, Laval and British Columbia 25 . A description of sample collection, demographics of the dataset and gene expression and genotyping analysis has been published previously 19,25 . Briefly, lung tissue was collected from patients with diverse lung diseases who underwent lung resection surgery at Laval University, University of British Columbia, and Table 3. List of GWAS on COPD and lung function reported by the GWAS catalog and analyzed in our study.

First author
Year Study Disease/trait References

Pillai, SG 2009
A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility Loci COPD 26 Siedlinski, M 2011 Genome-wide association study of smoking behaviours in patients with COPD Lifetime average and current cigarettes per day, age at smoking initiation, and smoking cessation in COPD 27 Cho, MH 2012 A genome-wide association study of COPD identifies a susceptibility locus on chromosome 19q13 COPD 28 McDonald, ML 2014 Common genetic variants associated with resting oxygenation in chronic obstructive pulmonary disease Resting oxygen saturation [SpO 2 ] in COPD 29 Smolonska, J 2014 Common genes underlying asthma and COPD? Genome-wide analysis on the Dutch hypothesis COPD; asthma 30 Dijkstra, AE 2015 Dissecting the genetics of chronic mucus hypersecretion in smokers with and without COPD Chronic mucus hypersecretion in heavy smokers with and without COPD 31 Hobbs, BD 2017 Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis COPD 32 Sakornsakolpat, P 2019 Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations COPD 33 Lutz, SM 2019 Common and rare variants genetic association analysis of cigarettes per day among ever-smokers in chronic obstructive pulmonary disease cases and controls Average cigarettes per day in COPD 34 Repapi, E 2010 Genome-wide association study identifies five loci associated with lung function Lung function (FEV1 and FEV1/FVC) 35 Hancock, DB 2010 Meta-analyses of genome-wide association studies identify multiple loci associated with pulmonary function Lung function (FEV1 and FEV1/FVC) 36 Yao, TC 2014 Genome-wide association study of lung function phenotypes in a founder population Lung function (FEV1, FVC and FEV1/FVC) 37 Liao, SY 2014 Genome-wide association and network analysis of lung function in the Framingham Heart Study Lung function (FEV1 and FVC) 38 Lutz Genes and pathways underlying susceptibility to impaired lung function in the context of environmental tobacco smoke exposure FEV1 in environmental tobacco smoke 43 Suh, Y 2017 Genome-wide association study for genetic variants related with maximal voluntary ventilation reveals two novel genomic signals associated with lung function Lung function (inspiratory muscle strength -maximal voluntary ventilation) 44 Wyss, AB 2018 Multiethnic meta-analysis identifies ancestry-specific and crossancestry loci for pulmonary function Lung function (FEV1, FVC and FEV1/FVC) 45 Li, X 2018 Genome-wide association study of lung function and clinical implication in heavy smokers Lung function (PostBD FEV1 and FEV1/FVC ratio) 46 New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries subjects passed all quality controls for DNA and RNA analysis and were included in our study. An overview of this dataset and associated methods is shown in Fig. 1. This dataset has been deposited in the National Center for Biotechnology Information's Gene Expression Omnibus repository and is accessible through GEO Series accession number GSE23546.
Airway wall biopsy dataset. A description of sample collection and demographics of this dataset has been published previously 22 . Briefly, bronchial biopsies were taken from segmental divisions of the main bronchi from healthy subjects and asthmatic patients. All protocols were approved by the University Medical Center Groningen medical ethics committee and all subjects provided written informed consent. RNA was isolated from the biopsies using AllPrep DNA/RNA Mini kit (Qiagen) and RNA samples were further processed using the TruSeq Stranded Total RNA Sample Preparation Kit (Illumina). The cDNA fragment libraries were then loaded in pools of multiple samples unto an Illumina HiSeq2500 sequencer. The gene level quantification was performed by HTSeq (version 0.6.1p1) using Ensembl version 75 as gene annotation database. DNA from blood samples was genotyped on Illumina genotyping platforms and submitted to the Michigan imputation server. Imputation was performed using the HRC r1.1 2016 reference panel, the Eagle v2.3 Phasing algorithm, the EUR (European) population parameter and the quality control plus imputation mode. From this dataset, the samples from 148 subjects passed all quality controls for DNA and RNA analysis and were included in our study. An overview of this dataset and associated methods is shown in Fig. 5.

Gene expression analysis.
For the gene expression analysis of the current study, data from the lung tissue dataset was used. Unfiltered gene expression was normalized with the Robust Multichip Average method implemented in the Affymetrix Power Tools softwareV.1.8.5. The Log2(microarray intensity) values of gene expression are used for all subsequent analyses. Gene expression data were adjusted for age, gender and smoking status, using a robust linear model. For gene expression analysis (MIF, DDT and DDTL) a subset of COPD patients and non-COPD control subjects from the lung tissue dataset was selected. COPD was defined as an FEV 1 / FVC ratio < 70%. Non-COPD was defined as an FEV 1 /FVC ≥ 70% predicted and subjects were selected to match COPD patients as closely as possible on age, gender and smoking status. From both groups, current and ex-smokers of > 40 years with ≥ 5 pack-years were included. For FEV 1 and FEV 1 /FVC, pre-bronchodilator values were used. Subjects with other lung diseases such as asthma, cystic fibrosis or interstitial lung diseases were excluded. Gene expression values in COPD and non-COPD patients were tested for normal distribution with a Kolmogorov-Smirnov test. These data did not have a normal distribution; therefore differences in gene expression levels between groups were tested with a Mann Whitney U test and correlations between gene expression and other parameters were performed with a nonparametric Spearman correlation analysis.
Expression quantitative trait loci (eQTL) analysis. To identify SNPs significantly regulating gene expression of MIF, DDT and DDTL, a cis-eQTL analysis was performed. Here cis-eQTL is defined as the SNPs significantly associated with gene expression and located within 1 Mb from either side of the binding side of the probe. The eQTL analysis was performed using the lung tissue dataset, as described previously 19 . Briefly, the association between SNPs and the 2-log transformed gene expression of MIF, DDT and DDTL was tested in each cohort (Laval, British Columbia and Groningen). Subsequently, an eQTL for all cohorts was calculated and a Bonferroni-adjusted p-value < 0.05 was used as a significance threshold to correct for multiple testing. An overview of the methods and the step-by-step approach are shown in Figs. 1 and 3a, respectively.
Linkage disequilibrium analysis. Linkage disequilibrium (LD) between eSNPs for MIF and DDTL was tested with the LDlink tool from the National Institutes of Health of the United States (https ://ldlin k.nci.nih. gov), for European populations using an R 2 threshold of 0.8. The SNPs belonging to the same LD block (R 2 > 0. 8) were clustered together and the remaining SNPs were catalogued as independent signals.
MIF splice variants and spliceQTL. The effect of eSNPs on MIF splice variant expression was performed with the airway wall biopsy dataset. Using the MatrixEQTL package it was determined whether SNP allele dosages were associated with split read counts of splice junction pairs within 1 Mb of the SNP (Cis-SpliceQTL). Predicted dosage of the alternative allele was used as the explanatory variable and age, gender and current smoking status were set as covariables. SNPs with a minor allele frequency lower than 0.05 were removed. A custom script was used to quantify split reads in the sequence alignment map files, as identified by the N-operation in the CIGAR string. Splice junction pairs identified by split reads were grouped by strand, start and end position of the intron and annotated to the host gene. Number of split reads of a given junction pair were normalized per sample by correcting for variation in library size and transcript abundance in a gene-wise fashion.
Scientific Reports | (2020) 10:16980 | https://doi.org/10.1038/s41598-020-74121-w www.nature.com/scientificreports/ Identification of significant eSNPs in GWAS for COPD. To identify the presence of the significant eSNPs from our study in GWAS for COPD, the LD partners of all SNPs found in our study were identified with the LDproxy tool of the National Institutes of Health of the United States (https ://ldlin k.nci.nih.gov) for European populations. All significant SNPs found in our study and all proxy variants reported by the LDproxy analysis were cross-referenced with the list of SNPs reported in the GWAS catalog from the European Bioinformatics Institute (https ://www.ebi.ac.uk/gwas/; reporting SNPs with p-value < 1 × 10 -5 ) found by the GWAS for FEV 1 and COPD listed in Table 3.

Data availability
The lung tissue dataset analyzed in the current study is available in the National Center for Biotechnology Information's Gene Expression Omnibus repository and is accessible through GEO Series accession number GSE23546. Sequence data from the bronchial biopsy dataset analyzed in the current study has been deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGAS00001003735.