Introduction

Allogeneic hematopoietic stem cell transplantation (HSCT) is a well-established curative treatment for many hematological malignancies. Graft-versus-host disease (GvHD) is the major life-threatening complication of HSCT. GvHD is mediated by donor immune cells in the graft, which recognize the patient’s tissues as foreign and destroy them. As the recipient is immunocompromised due to conditioning and immunosuppressive medication, the immune system of the recipient is not able to kill the foreign cells of the graft. GvHD occurs in 20–50% of HSCTs1.

One critical step in GvHD initiation is extensive immune activation due to microbial antigens that leak from the gastrointestinal track due to conditioning. The microbial antigens are detected by antigen-presenting cells (APCs) via pathogen-associated molecular pattern (PAMP) receptors, such as Toll-like receptors (TLRs) and NODs, leading to the activation of these cells. The activation of APCs leads to a cytokine storm, i.e., the production of high levels of cytokines, such as interleukin (IL)6, interferon (IFN)γ, IL23 and tumor necrosis factor (TNF) which, in turn, activate other immune cells1,2,3.

The outcome of HSCT is strongly influenced by the genetic differences between recipient/donor pairs4. The golden rule is genetic similarity or identity in the human leukocyte antigen (HLA) genes located in the major histocompatibility complex (MHC) on chromosome 6. However, there is evidence that HLA identity is not sufficient to prevent GvHD. Minor histocompatibility antigens5, mismatches in gene deletions6, non-HLA polymorphisms in immunoregulatory molecules7,8,9,10,11,12, drug-metabolizing genes13,14, and regulatory elements e.g., non-coding RNAs11, affect the risk of adverse outcomes15. In fact, we can assume GvHD as a multifactorial trait with a genetic component, in which the HLA matching is a crucial but not sufficient factor. As noted by Warren et al.4, while clinical HLA testing – sequencing of certain HLA gene exons - certainly focuses on functionally relevant variations, it encompasses only 1/1000th of the entire MHC sequence, or 1000000th part of the whole genome.

Genome-wide association (GWA) studies provide a systematic view of the genetic architecture and genetic risk markers of GvHD. One challenge for GWA studies in HSCT is diagnostic and treatment heterogeneity and the fact that the outcome may depend on properties of both the donor and recipient. It has been estimated that at least a few thousand HSCTs should be studied to reach sufficient statistical power for a GWA study4,15. To begin to tackle this challenge, we used an alternative approach described by Chien et al.10 and screened previously reported genetic associations in two additional populations with the rationale that, if replicated independently in many populations, the association may be genuine. We evaluated the previously reported GvHD-associated single nucleotide polymorphisms (SNPs) in two HLA-matched sibling allogeneic HSCT cohorts derived from Finnish and Spanish populations. To obtain coherent genetic information, the Immunochip array data were imputed and strictly filtered. We further analyzed the downstream functional effects of the associated SNPs to identify the disease-related biological pathways involved.

Results

Selection of the candidate SNPs

The 40 SNPs discovered by Chien et al.10 were included in the present analysis. The PubMed literature search from April 30, 2011 to January 31, 2017 identified 26 studies reporting an additional 82 SNPs associated with GvHD. From these 122 SNPs, 50 and 64 were found in the imputed and filtered Finnish and Spanish genotype datasets, respectively. The complete list of analyzed SNPs is presented in Supplementary Tables 1 and 2, and detailed data of imputed and genotyped SNPs associated with GvHD in the current study are depicted in Supplementary Tables 2 and 3. All Hardy–Weinberg equilibrium (HWE) P-values were >6 × 10−4 and no call rate was less than 93%. Metrics depicting the success of imputation were close to 1, implicating that these SNPs were imputed with high certainty.

Candidate SNPs associated with acute and chronic GvHD in the Finnish cohort

In the Finnish cohort, we evaluated the association between 50 candidate SNPs and both acute GvHD (aGvHD) and chronic GvHD (cGvHD) outcomes. A summary of these associations at an α-level <0.05 is presented in Table 1.

Table 1 Candidate SNPs associated with acute and chronic GvHD in the Finnish cohort.

Two recipient and four donor genotypes displayed an association with aGvHD. These results demonstrate an association between recipient rs2523957 (MICD) and rs7294 (PRSS53/VKRC1) and an increased risk of aGvHD (P = 0.001 and 0.010, respectively). Among the donors, SNPs rs3917225 (IL1R1, P = 0.022), rs2523957 (MICD, P < 0.001), rs1800629 (TNF, P = 0.029), and rs2233409 (NFKBIA, P = 0.031) predisposed recipients to an adverse outcome. It is important to note that the minor allele G at rs2523957 (pseudogene MICD) was associated with an increased risk of aGvHD in both the recipient (odds ratio [OR] 2.82, 95% confidence interval [CI]: 1.51–5.27, P = 0.001) and donor (OR 2.75, 95% CI: 1.55–4.87, P < 0.001) genotypes.

Two recipient and two donor genotypes showed associations with cGvHD. Recipient genotypes at rs16944 (IL1b) and rs6500328 (NOD2) resulted in a borderline increased risk of cGvHD (P = 0.047 and 0.035, respectively). The donor missense genotype at rs2075800 (HSPA1L) displayed a borderline protective association with cGvHD (OR 0.62, 95% CI: 0.38–0.99, P = 0.046). A synonymous codon at rs1137282 (KRAS) reduced the risk of cGvHD (OR 0.46, 95% CI: 0.24–0.88, P = 0.017).

Candidate SNPs associated with acute and chronic GvHD in the Spanish cohort

A summary of the associations identified between candidate SNPs and aGvHD and cGvHD in the Spanish cohort is presented in Table 2. In total, 64 SNPs were evaluated. Table 2 shows the associations with an α-level <0.05. In the Spanish cohort, four recipient and six donor genotypes were associated with aGvHD. In both the recipient and donor genotypes, the minor allele A at rs2800230 (not in the gene) was associated with protection from aGvHD (recipient OR 0.57, 95% CI: 0.34–0.95, P = 0.031; donor OR 0.58, 95% CI: 0.35–0.95, P = 0.029). The IL1B SNP rs1143634 (P = 0.012) and the IL1A-annotated SNP rs1800587 (P = 0.031) conferred susceptibility to aGvHD in the recipient genotype. The minor allele G at rs2862833 (Fas cell surface death receptor, FAS) was associated with protection from aGvHD (OR 0.52, 95% CI: 0.31–0.87, P = 0.013).

Table 2 Candidate SNPs associated with acute and chronic GvHD in the Spanish cohort.

In Spanish donors, IL10 promoter SNPs rs1800872 and rs1800871 increased the risk of aGvHD (OR 2.19, 95% CI: 1.33–3.63, P = 0.002 for both SNPs). The IL10RB missense variation at rs11209026 was also associated with an increased risk of aGvHD (OR 1.96, 95% CI: 1.17–3.28, P = 0.010). However, the IL10 promoter SNP rs1800896 displayed a protective minor allele C association with the disease (OR 0.48, 95% CI: 0.29–0.80, P = 0.004).

Two donor genotype TLR9 SNPs (rs352140, P = 0.018 and rs352139, P = 0.023) were associated with protection from cGvHD. An adverse association with cGvHD was found in cases with the recipient genotype IL23R missense SNP rs11209026 (OR 2.61, 95% CI: 1.13–6.04, P = 0.020).

Expression quantitative trait loci analysis of candidate SNPs

The downstream effects of GvHD-associated SNPs on mRNA expression were determined using the Blood expression quantitative trait loci (eQTL) Database generated by Westra et al.16. When analyzing all 21 SNPs showing an association with aGvHD or cGvHD in the current study, several SNPs were determined to affect the expression of nearby genes (Tables 3 and 4). No significant trans-eQTL effects were detected.

Table 3 eQTL analysis of candidate SNPs in the Finnish cohort.
Table 4 eQTL analysis of candidate SNPs in the Spanish cohort.

In the Finnish cohort (Table 3), SNP rs2075800, located within the heat shock protein A1-like (HSPA1L) gene, protected individuals from cGvHD and was associated with increased expression of HSPA and the adjacent HSPA1B gene, with an FDR level <0.05 (HSPA1B Z-score 20.06, P = 1.78 × 10−89; HSPA1L Z-score 8.17, P = 3.02 × 10−16). In both the recipient and donor genotypes, aGvHD-predisposing rs2523957 (pseudogene MICD) was associated with reduced expression of HLA-G, with a Z-score = −13.89 and a P = 7.18 × 10−44, and increased expression of HLA-F, with a Z-score = 15.64 and a P = 3.65 × 10−55. Additionally, the cis-eQTL results for TNF revealed that the disease-predisposing SNP rs1800629 reduced expression of the TNF gene (P = 1.28 × 10−7). It is also important to note that the HSPA1L, HSPA1B, MICD, and TNF genes are all located in the MHC region. The intronic NOD2 rs6500328 decreased expression of the NOD2 gene (Z-score −22.76, P = 1.10 × 10−114) and the minor allele G was identified as a risk factor in the outcome association analysis (Table 1).

In the Spanish cohort, two TLR9-annotated SNPs were associated with GvHD (Table 4). The aGvHD-protective SNPs, rs352140 and rs352139, increased the expression of PPM1M (rs352140 Z-score 17.02, P = 5.90 × 10−65; rs352139 Z-score 17.2, P = 2.71 × 10−66). The protective SNP rs2862833, a downstream variant of the FAS gene, increased the expression of FAS (Z-score 10.8, P = 3.51 × 10−27) and the STAMBPL1/ACTA2 locus (Z-score 38.25, P = 9.81 × 10−198). IL10 promoter SNPs, rs1800872, rs1800871, and rs1800896, did not demonstrate any significant cis-eQTL association. The aGvHD-predisposing IL10RB missense SNP rs2834167 was associated with increased expression of IL10RB, with a Z-score = 7.01 and a P = 2.44 × 10−12.

Cytokine QTL associations of candidate SNPs

The cytokine storm plays an important role in the initiation phase of GvHD. Therefore, we examined cytokine QTL effects of the associated SNPs by utilizing the cytokine QTL database recently published by Li Y et al.17. The results are presented in Tables 5 and 6.

Table 5 Cytokine QTL associations of candidate SNPs in the Finnish cohort.
Table 6 Cytokine QTL associations of candidate SNPs in the Spanish cohort.

In the Finnish cohort, six of seven SNPs associated with an increased risk of GvHD were linked with alterations in the production of IL6 and IFNγ by peripheral blood mononuclear cells (PBMCs) at an alpha level <0.05 (Table 5). When examining the link between predisposing genotypes at MICD and PRSS53/VKRC1 loci and IL6, the initial stimuli were revealed as fungi Cryptococcus and Candida albicans. Changes in the production of IFNγ emerged following stimulation with Bacteroides, Cryptococcus, Stafylococcus aureus, and C. albicans combined with carrying a risk allele at IL1R1, TNF, IL1β, or NOD2, respectively. Donor genotype T at the HSPA1L-annotated SNP rs2075800 was combined with whole blood stimulus by phytohaemagglutinin and production of IFNγ (P = 0.007).

In the Spanish cohort, the majority of cytokine QTL associations focused on the IL1β and IFNγ responses of stimulated PBMCs (Table 6). Recipient SNPs at IL1α and IL1β aGvHD risk loci were significantly associated with altered production of IFNγ by PBMCs following stimulation with C. albicans (rs1800587 and rs1071676, P < 0.05 for both) or Cryptococcus (rs1143634, P = 0.030). Donor aGvHD-predisposing IL10 promoter region genotypes at rs1800872 and rs1800871 combined with the IL1β response of Escherichia coli-stimulated PBMCs (P = 0.001). However, the aGvHD-protective IL10 SNP, rs1800896, displayed a borderline association with the IFNγ response after stimulation with Borrelia burgdorferi (P = 0.049). Donor minor allele G at rs2834167 (IL10RB), having an adverse association with aGvHD, combined with altered production of IFNγ by C. albicans-stimulated PBMCs. The IL6 response was altered when C. burnetii-stimulated PBMCs were combined with the aGvHD-protective genotype at the FAS locus rs2862833 (P = 0.008).

Discussion

To systematically identify genetic loci that are associated with GvHD, we screened previously reported SNPs for their genetic associations with aGvHD and cGvHD in a total of 492 HLA-matched sibling HSCT recipient-donor pairs. The cohorts were derived from two populations: Finnish and Spanish. The major finding of the present study was that, despite clear heterogeneity in GvHD-associated polymorphisms between the two populations, the markers share a common feature: they are predominantly annotated with genes that are important in the host response to microbial antigens. Furthermore, the functional effects of these polymorphisms were related to the same pathways.

The GvHD-associated genes included IL1, IL10, IL23R, TLR9, TNF, and NOD2, which all play a role in the host response to microbes. However, it was of further interest that the polymorphisms were determined to regulate the expression levels of cytokines IL1β, IL6, and IFNγ, all of which are important mediators of the cytokine storm.

Several SNPs annotated to pathogen recognition receptors (PRRs) have been previously associated with GvHD7,8. In TLR9, which detects intracellular bacterial single-stranded CpG-DNA, we found protective polymorphisms that showed no direct effect on expression of the TLR9 gene in two of the QTL databases utilized herein. The intronic risk SNP rs6500328 in the NOD2 gene was associated with reduced expression of the NOD2 gene and was also associated with IFNγ expression in the cytokine QTL database. The cytokine QTL analysis demonstrated complex crosstalk between the associated SNPs, their direct QTL effects and the response of particular cell populations to microbial antigens. To further support a role for PRRs in GvHD, our unpublished studies suggest that LPS-recognizing TLR4 displays intronic minor alleles at rs12377632 and rs1927907, both of which were associated with GvHD protection and a strong increase in TLR4 expression. The protective TLR4 genotype at rs1927907 was associated with the IL1β response of C. burnetii-stimulated PBMCs. However, their associations with aGvDH have not yet been reported; therefore, these SNPs were not included in the current study and should be further analyzed in other populations.

The use of eQTL and cytokine QTL databases allows for demonstration of the functional or downstream effects of disease-associated polymorphisms. The eQTL analyses performed herein revealed interesting findings and indicated shared pathways. The association between IL10 polymorphisms and GvHD has been established in many populations18,19,20,21 and it has been assumed to be related to different expression levels of IL10. However, the IL10 markers rs1800872 and rs1800871 determined to be associated in the present study showed no eQTL effects on IL10 expression, but rather the polymorphisms regulated the IFNγ and IL1β levels produced by PBMCs after stimulation with E. coli or B. burgdorferi. In contrast, missense polymorphisms in IL10RB did regulate the level of IL10RB and IFNγ in the cytokine eQTL. Hence, each polymorphism may exert various effects at different steps of the immune response. Unfortunately, the cytokine database in its present form shows no relationship between the allele and the direction of the measured cytokine response, making it difficult to interpret the mechanisms of risk alleles. In this regard, GvHD-specific QTL databases would facilitate the functional interpretation of significant variants.

A number of SNPs showing an association with GvHD risk in the Finnish cohort mapped to MHC on chromosome 6p21.3. TNF, MICD, and HSPA1L are located relatively close to each other; therefore, the observed associations may be derived from a single genetic polymorphism in linkage disequilibrium with the markers analyzed here. Alternatively, there may be multiple polymorphisms within the MHC segment associated with the disease. For example, emerging evidence indicates multiple novel MHC-associated risk markers for GvHD4,22. Based on the present results, it is not possible to pinpoint which marker is the primary or closest to the true risk polymorphism. In fact, as long as we do not know the causal SNP, these results only indicate that the genes annotated to SNP may influence the risk for GvHD. Differences in linkage disequilibrium between causal and studied SNPs result in discrepancies observed between results from different populations as seen also in the present study when compared to original findings.

In addition to ethnicity and the genotyping array performed, the two cohorts investigated in this study also differed from each other with respect to their clinical HSCT setting. The Finnish cohort was from a single center, whereas the Spanish cohort originated from a number of clinics. The stem cell source, conditioning regimen, and GvHD prevention procedures varied significantly and may have contributed to the heterogeneity of the GvHD-associated SNPs. Combining these typical aspects with well-established GvHD risk factors2,3, such as donor and recipient age, transplant gender direction, diagnosis and staging, and infections, may also partially explain why, despite numerous GvHD candidate genes and markers studied in recent years, the consistency of results across studies has been sparse.

To date, genome-wide studies in GvHD have been reported only among mostly Caucasian23 and Japanese24 populations. These studies have not reported overlapping or shared risk loci, indicating the heterogeneous nature of GvHD genetics. It will be of interest to test whether the utilization of eQTL or pathway approaches similar to those used in the present study would reveal common mechanisms behind apparently heterogeneous associations. While GWA studies investigating millions of variants require vigorous control of multiple tests, due to the replicative nature of the present study, an alpha value <0.05 was selected as the threshold for statistical significance for allele frequency association with GvHD. The fact that the associated genes were functionally relevant to GvHD can be regarded as additional supportive evidence.

As demonstrated by the present study and many other association studies8,9,10,25, individuals may carry genetic factors rendering them more susceptible to react immunologically against foreign structures, such as allogeneic cells or intestinal microbes. Such a high responder genotype may be helpful in clearing infections but may also increase the risk of GvHD. This may be one of the important genetic factors for GvHD risk. We can also assume that an increased risk of GvHD results from insufficient histocompatibility; despite good HLA matching, mismatches in minor histocompatibility antigens may also play a role in GvHD risk5,6. Another possibility is the pharmacogenomic differences in the response to or efficiency of the immunosuppressive treatment13. This has been scarcely explored but certainly merits further investigation. Hence, we constructed at least three overlapping models for GvHD genetics that most likely act together.

The present study provides further evidence that genetic variation regulating the level of the immune response against bacterial antigens is an important non-HLA factor in GvHD susceptibility. Although individual associated polymorphisms are not necessarily the same throughout different populations, they consistently belong to the same regulatory pathways participating in cytokine-mediated inflammation of the intestinal epithelium. It is likely that a similar type of heterogeneity can be found in other populations or cohorts, and it remains to be determined whether associated markers also belong to the same biological pathway. This heterogeneity implies that large genome screens may be needed for clinical GvHD predictions, rather than focusing on only a small number of selected genetic markers.

Methods

Literature search

Chien et al.10 identified 41 publications reporting 40 SNPs associated with aGvHD up to April 30, 2011, which were included in our analysis. We also performed a PubMed search using the term “acute GvHD AND polymorphism” to identify published studies reporting an association analysis of genetic variants with GvHD from April 30, 2011 until January 31, 2017. Studies reporting associations at an α-level >0.05 were excluded and, from all of the variant types, only SNPs were selected for further analysis.

Study populations

The SNP association analyses were replicated within two separate populations. The characteristics of all recipients in these cohorts are presented in Table 7. The Finnish cohort consisted of 301 HLA-matched recipient/donor sibling pairs having clinical data and DNA samples sent for genotyping. The cohort also included 11 individual recipients and 8 donors without the respective sibling. All recipients underwent allogeneic HSCT at Helsinki University Hospital, Comprehensive Cancer Center, Stem Cell Transplantation Unit, Finland, between 1993 and 2006. The pairs were matched to low-resolution level at HLA-A, -B, and -DRB1 loci. HLA typing was performed with Lymphotype HLA-AB and Lymphotype HLA-DR-DQ (Bio-Rad Medical Diagnostics), LIPA Reverse Dot Blot (Innogenetics Group), or HLA-SSP (Pel Freez, Dynal Biotech LLC). The present cohort overlapped significantly with those utilized in our previous publications6,12,21,26. After genotyping and imputation, the cohort in the present study included 239 recipient/donor pairs, 23 individual recipients, and 28 individual donors. The majority (>75%) of GvHD prevention procedures combined cyclosporine, steroid, and 3 to 4 doses of methotrexate, while 18% received a combination of cyclosporine and mycophenolate mofetil.

Table 7 Characteristics of the Finnish and Spanish recipients.

The Spanish cohort was composed of 264 HLA-matched recipient/donor sibling pairs having clinical data and DNA samples sent for genotyping. The cohort also included 10 individual recipients and 30 donors without the respective sibling. HLA matching was completed at low-resolution at HLA-A and -B loci and at high-resolution at the HLA-DRB1 locus. Recipients received allogeneic HSCT between 2002 and 2014 at 13 Spanish transplant centers. After genotyping and imputation, the final study cohort was composed of 253 recipient/donor pairs, 15 individual recipients, and 30 individual donors. For GvHD prevention, 57% of recipients received a combination of cyclosporine and methotrexate, 10% received cyclosporine only, and 11% were treated with a combination of cyclosporine and mycophenolate mofetil.

The clinical outcomes examined were severe acute and chronic GvHD. The phenotypes compared were aGvHD grade 0 versus grades III–IV and absent cGvHD versus extensive cGvHD. Local determinations of GvHD grades were used. The samples were graded according to guidelines established by the European Society for Blood and Marrow guidelines27,28.

This study conformed to principles of the Declaration of Helsinki and was approved by the Ethics Committee of Helsinki University Central Hospital and the DNA bank of the Spanish Group for Stem Cell Transplantation (GETH). All participants gave written informed consent.

Genotyping and imputation

Genotyping was performed at FIMM Technology Centre, Helsinki, Finland. DNA samples from the Finnish cohort were extracted using the QIAamp DNA Blood Mini Kit (Qiagen) from the white blood cell fraction of peripheral blood samples and sent for HLA typing. The Finnish cohort was genotyped using an Immunochip (Illumina) array comprising 196524 variants in 2013. DNA samples of the Spanish cohort were received from the DNA bank of the GETH. The array utilized for the analysis of Spanish samples from the years 2016–17 was the Infinium® ImmunoArray-24 v2.0 (Illumina), which comprises 253702 variants. Initial quality control identified samples with discordant sex information, duplicate samples, a call rate <97%, and heterozygosity excess <−0.3 (not X chromosome) or >0.2 and >0.1 for the X chromosome.

The autosomal genotype data were imputed with IMPUTE2 using 1000 Genomes Phase 3 as a phased reference panel29. Pre-filtering of the variants and samples was completed according to the methods described by Anderson et al.30. Individuals with a missing genotype >3%, variants with a minor allele frequency (MAF) <1%, variants with a missing data rate >5%, and variants with a HWE P-value < 1 × 10−5 were excluded. The principal components of both cohorts were determined and the imputation procedures were carried out separately. Post-imputation filtering excluded variants having an IMPUTE2 INFO-field measure of the observed statistical information <0.531. After post-imputation filtering, 5041081 and 5737173 variants were included in the Finnish and Spanish cohort genotype datasets, respectively. The datasets analyzed during the current study are not publicly available due to limitations of ethical permits which do not allow distribution of personal data, including individual genetic and clinical results.

Statistical analyses

The significance of variation between characteristics in the study cohorts was analyzed using the non-parametric Mann–Whitney U-test (recipient and donor age), Pearson’s chi-square test (transplant gender direction, diagnosis, stem cell source, condition regimen, and GvHD grade), or Fisher’s exact test (aplastic anemia diagnosis). P-values < 0.05 were considered statistically significant (Table 7).

Principal component analysis (PCA) was used to determine the genetic population structure of the two study cohorts. Non-imputed common SNPs shared by the two cohorts were included in the analysis. The SNPs were pruned to exclude strong linkage disequilibrium32. The analysis was executed with Plink v1.90b3u (www.cog-genomics.org/plink/1.9/)33 commands indep-pairwise 50 5 0.8 and pca, and the result was plotted using R version 3.3.334. A PCA-plot of the first two dimensions is presented in Supplementary Figure 1.

An association between the SNPs and acute and chronic GvHD was determined using the chi-square allelic test and is expressed as the OR with the 95% CI (Tables 1 and 2). The frequencies of the recipients and donors are presented in Supplementary Table 5. SNPs with a MAF < 0.01 and HWE 1 × 10−5 were excluded from the analysis. The current study evaluated results presented before and, therefore, despite multiple tests (77 in the Finnish and 97 in the Spanish cohorts), a P-value < 0.05 was considered to support a statistically significant replication. Statistical analyses were performed with IBM SPSS Statistics version 24 and PLINK 1.0735.

The eQTL analyses (Tables 3 and 4) were performed in February 2017 utilizing the comprehensive Blood eQTL Database (http://genenetwork.nl/bloodeqtlbrowser/) published by Westra et al.16. The database consists of both cis- and trans-eQTL results generated from a meta-analysis of seven studies including 5311 peripheral blood samples and a replication analysis with 2775 samples. Z-scores with a false detection rate (FDR) <0.05 were considered statistically significant.

The cytokine QTL database (https://hfgp.bbmri.nl/), recently published by Li Y et al.17, combines the host genetics and cytokine production after various microbial stimuli. The effects of candidate SNPs on the cytokine response were analyzed in March 2017 (Tables 5 and 6). P-values < 0.05 were considered to be statistically significant.