Introduction

Eosinophilic esophagitis (EoE) is a chronic inflammatory disease of the esophagus triggered by immune hypersensitivity to food. Multiple lines of evidence, including molecular transcript profiling, cytokine expression, and genetic studies have highlighted its close relationship with type 2 immune responses, and EoE is now considered a chronic form of food allergy [1]. EoE susceptibility is linked to a genetic factor at 2p23, the CAPN14 gene, which has tissue-specific expression in the esophagus [2, 3]. This genetic association has been replicated in multiple cohorts [3,4,5], adding credence to the importance of the 2p23 genetic association and resulting in a combined P value of 1.7 × 10−10. Genome-wide association studies (GWAS) have also identified EoE genetic risk loci that were linked to other allergic diseases [6]. For example, genetic variants at 5q22 encoding TSLP and WDR36 have been associated with allergic sensitization, asthma, allergic rhinitis, atopic dermatitis, and EoE, suggesting that these loci contain variants that participate in the allelic regulation of a molecular pathway that is central to the etiology of allergic disease [2, 4, 7,8,9,10,11,12,13,14]. Likewise, the 11q13 EoE risk locus encoding EMSY and LRRC32 has been robustly replicated in studies of EoE [4, 15, 16] and is also associated with atopic dermatitis [7, 17,18,19], asthma [9, 11, 20], allergic sensitization [20], allergic rhinitis [11], and inflammatory bowel disease [21]. Indeed, genome-wide approaches have demonstrated significant overlap of some EoE genetic risk loci across allergic diseases [1,2,3, 22].

The Immunochip was designed to genotype and fine-map genetic risk loci that were established for major immune-associated diseases including rheumatoid arthritis, ankylosing spondylitis, systemic lupus erythematosus, type 1 diabetes, autoimmune thyroid disease, ulcerative colitis, Crohn’s disease, psoriatic arthritis, multiple sclerosis, and celiac disease; notably, the latter has an increased prevalence in patients with EoE [23]. As the introduction of the Immunochip in 2011, its use has contributed to a marked increase in known susceptibility loci and the comparison of susceptibility loci between phenotypes [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]. Herein, we probed the genetic etiology of EoE with multiple objectives, including (1) determining whether EoE risk loci would be shared with these immune-mediated diseases that have already been subjected to intense investigation; (2) identifying genetic variants with plausible function, as the Immunochip was enriched for functional variants; and (3) fine-mapping the human leukocyte antigen (HLA) region, as this region confers risk for other immune-mediated diseases. We report genetic analysis of EoE using the Immunochip platform and the largest cohort of subjects with EoE subjected to genetic analysis to date.

Results

To evaluate EoE risk at genetic loci associated with a variety of immune-associated diseases, 1214 subjects with EoE and of European ancestry and 3734 population controls were genotyped using the Immunochip candidate genotyping array [23].

After stringent quality control based on Hardy–Weinberg disequilibrium and a call rate of >99% and lack of batch effect (described in Methods), 79,405 genetic variants had minor allele frequencies >1% and were used for this association study. The subjects with and without EoE were assigned to either the Local or External study cohorts (Supplemental Table 1). The Local cohorts included EoE patients from the Cincinnati Center for Eosinophilic Disorders (n = 966) and controls from the Cincinnati Genomic Control Cohort (n = 641). Controls from the Lupus Family Registry and Repository (n = 3093) and patients with EoE recruited outside of Cincinnati through the National Institutes of Health Consortium of Food Allergy Researchers (CoFAR) (n = 244) were assigned to the External cohort. Initially, one locus at 6p21 with genome-wide association (P < 5 × 10−8) and one locus at 16p13 with suggestive significance (P< 10−7) were identified (Fig. 1, Table 1). Next, independent experimental association was sought from the published GWAS by assessing the statistical significance of the most highly associated variant at each locus from the Immunochip analysis in the GWAS population after removing all subjects who overlapped between the two studies from the GWAS analysis [3] (Fig. 1, Table 1, Supplemental Table 1). No association was identified at 6p21 in the non-overlapping GWAS cohort; however, the genetic risk association at 16p13 was validated, resulting in genome-wide significance of the combined cohorts (P= 2.05 × 10−9) (Table 2). A logistic regression analysis demonstrated a single genetic effect, with all association in the locus accounted for by the genotype of rs12924112 (Fig. 2). This particular genetic variant was located in the 20th intron of CLEC16A.

Fig. 1
figure 1

Manhattan plot of the P values obtained from the Immunochip association analysis. Data are from 1210 subjects with eosinophilic esophagitis (EoE) and 3734 controls over 79,405 genetic variants with minor allele frequencies (MAFs) greater than 1% in the subjects with EoE. The −log10 value of each probability is shown as a function of genomic position on the autosomes. Genome-wide significance (red dashed line; P ≤ 5 × 10−8) and suggestive significance (solid blue line; P ≤ 1 × 10−7) are indicated

Table 1 Loci with significant or suggestive associations (P< 10−6) in the EoE Immunochip analysisa
Table 2 Comparison of loci with significant or suggestive associations between Immunochip and genome-wide association analyses (P< 10−6)a
Fig. 2
figure 2

Genetic association of variants at the 16p13 loci with EoE risk. a P values (−log10) from the genetic association analysis of genotyped and imputed variants are plotted against the genomic position of each genotyped (blue) and imputed (red) single-nucleotide polymorphism (SNP) on the x axis on chromosome 16. b P values (−log10) from the genetic association analysis adjusting for the association of rs12924112 of genotyped and imputed variants are plotted against the genomic position of each genotyped (blue) and imputed (red) SNP on the x axis on chromosome 16. Genes in the region are shown below. Position is given relative to Build 37 of the reference genome. Black lines indicate the recombination rates determined using subjects of European ancestry from the 1000 Genomes Project

The 16p13 locus has been associated with ten other immune-associated phenotypes ranging from atopic dermatitis and asthma with hay fever to the autoimmune diseases systemic lupus erythematosus and type 1 diabetes (Table 3) [11, 33, 34, 36, 38, 40,41,42,43,44,45,46,47,48,49]. On the basis of the linkage disequilibrium between the most highly associated disease protective variant in other diseases and the lead EoE 16p13 protective variant, variants decreasing risk for EoE also decrease risk for type 1 diabetes, multiple sclerosis, primary biliary sclerosis, and systemic lupus erythematosus (Table 3). The lead variants reported for atopic dermatitis and asthma with hay fever were in relatively weak linkage disequilibrium with the EoE risk variants (Table 3).

Table 3 Other immune-associated disease with a 16p13 genetic risk locusa

The 16p13 locus encodes the genes CLEC16A, DEXI, and CITTA. These genes are known to be expressed in the esophageal mucosa [50,51,52,53] at levels similar to other tissues. Indeed, expression of the three genes was found in the esophageal biopsies of subjects with and without EoE (Fig. 3a). CLEC16A and DEXI were expressed in esophageal epithelial cells and were found to not be modulated by IL-13 treatment, while CITTA was not expressed in esophageal epithelial cells (Fig. 3b). CLEC16A, DEXI, and CITTA are also expressed in various immune cell subsets (Fig. 3c) [54, 55]. In monocytes, the EoE risk haplotype at 16p13 is associated with increased expression of DEXI in monocytes [56]. The same EoE risk haplotype has also been associated with increased expression of CLEC16A in B cell lines [57]. This suggests that genotype-dependent expression of DEXI and/or CLEC16A might lead to increased risk of EoE in patients with the 16p13 risk alleles.

Fig. 3
figure 3

Expression of genes at the 16p13 locus. a RNAseq expression of CLEC16A, DEXI, and CIITA mRNA from esophageal biopsies (Control n = 10, EoE n = 10). No significant differences were identified between Control and EoE. b RNAseq expression of genes from esophageal epithelial cells in air-liquid interface culture system with or without IL-13 stimulation for 5 days (n = 3 wells per group). For a and b, bars represent the mean and error bars represent the standard deviation. No significant differences were identified between no treatment and IL-13 treatment. c Barcode x-score relative microarray expression of CLEC16A, DEXI, and CIITA in various human immune cell subsets downloaded from http://biospgs.org/ (ref. [48]). Reads per killobase of transcript per million mapped reads, RPKM. Data are representative from multiple cellular subtypes in the Primary Cell Atlas dataset

Genetic variants at the 5q23 and 7p15 loci demonstrated modest association in the local and external Immunochip cohorts, but they failed to be reproduced by data from a previous GWAS analysis. Although they did not pass the threshold set by this study for significant association, they remain candidates to be further evaluated in subsequent studies. The suggestively associated EoE risk variants at the 5q23 locus are located in an intergenic region that is 14 million base pairs away from the EoE risk locus at 5q22 that encodes the TSLP and WDR36 genes with no linkage disequilibrium (R2 = 0.0005). The 7p15 locus near the gene JAZF1 has also been identified as a susceptibility locus for systemic lupus erythematosus, type 1 diabetes, and rheumatoid arthritis [58,59,60,61,62]. JAZF1 is also known as TIP27, and it encodes a transcription factor with three zinc fingers that often represses transcription [63].

The major histocompatibility complex (MHC), the Human Leukocyte Antigen (HLA) complex in humans, is a region of the genome on chromosome 6 that encodes genes that regulate antigen presentation to T cells. This region contains the most robustly and reproducibly associated risk variants for many immune-associated diseases including autoimmune and auto-inflammatory diseases; these genetic risk variants usually affect amino acid usage in the MHC molecules. The Immunochip was specifically designed to directly genotype variants across this locus, and its use has allowed teams to identify the genotype-dependent usage of MHC subtypes in diseases such as systemic lupus erythematosus [25], type 1 diabetes [64], and psoriatic arthritis [65]. Consistent with the three previous GWAS of EoE, we did not identify association of genetic variants that are located inside the MHC class I, II, or III genes. We did find association of rs599707 in 6p21 in both the local and external cohorts assessed on the Immunochip (Table 1, Supplemental Fig. 1); however, none of the variants in linkage disequilibrium (r2 greater than 0.8) changed amino acid usage in any gene. Based on the power analysis of the combined cohorts from the previous GWAS and present Immunochip studies (Supplemental Fig. 2), we can definitively confirm that there is no HLA association with EoE that is driven by variants with effect sizes greater than 1.4 or MAFs greater than 20%.

Discussion

We have probed the genetic basis of EoE focusing on genetic variants involved in a wide range of auto-immune and/or inflammatory diseases. We have identified one new genome-wide significant EoE risk locus at 16p13, a region encoding the CLEC16A, DEXI, and CITTA genes, and nominate three additional suggestive loci that warrant further analyses. The 16p13 finding identifies a region of the genome that includes genetic risk variants associated with numerous immune-associated diseases including both allergic and autoimmune diseases; however, it is notable that the vast majority of risk loci on the Immunochip did not reveal association with EoE consistent with the uniqueness of the genetic etiology. The specific risk haplotype at 16p13 was not shared with atopic disease related to EoE based upon LD of the most highly associated variants for each phenotype, suggesting different genetic effects are driving the shared association at 16p13.

This study was designed to identify EoE genetic risk loci that demonstrated association in internal and external cohorts in addition to the previously published GWAS at a group of loci nominated by previous studies of immune-associated diseases. The ImmunoChip does not include previously reported EoE-risk loci, so we are unable to assess the established 2p23, 5q22, or 11q13 risk loci. The limited number of samples that remained in the GWAS cohort after removing overlapping individuals may explain why some of the suggestive associations from the Immunochip analysis were not validated when assessing the independent subjects in the GWAS cohort (Supplemental Fig. 2C). Specifically, after removing 542 overlapping cases with EoE and 587 overlapping controls from the GWAS of 9982 subjects, only 194 cases and 8659 controls remained leaving the study with only 30% power to detect a locus with a large effect size (odds ratio of 2.0) and high MAFs (40%). This study also lacked the power to divide the patients with EoE into sub-classifications (e.g., patients with EoE responsive to proton pump inhibitors); however, future studies designed to identify genetic variants associated with the clinical presentation of EoE would be valuable.

The genotyping at the HLA region of the human genome is particularly dense on the Immunochip [23]. Over 200 diseases have robust HLA associations, especially autoimmune diseases [66] with effect sizes ranging from 1.3 to 3.0. It is notable that no association has ever been identified for EoE despite numerous genome-wide studies and this Immunochip study. Ulcerative colitis, Crohn’s disease, and celiac disease are three gastrointestinal diseases with strong HLA associations [67,68,69,70,71,72,73]. In each of these non-EoE gastrointestinal diseases, a robust genetic association with variants across HLA are a hallmark of nearly every Immunochip and GWAS to date. Celiac disease is enriched in EoE patients [74,75,76,77,78]; likewise, patients with celiac disease have a 25% increased risk of developing EoE [76]. Indeed, celiac disease and EoE share features including being food antigen driven, involving a defective epithelial barrier, and resolving upon removal of causal foods. The lack of highly associated EoE risk variants that change MHC subtypes through nonsynonymous disease risk polymorphisms remains a striking differentiating factor for EoE. rs599707 at 6p21 is an expression quantitative trait loci (eQTL) for numerous HLA molecules in monocytes (HLA-DPB1, HLA-DQA1, HLA-DQB1, HLADRB1, HLA-C, and HLA-H, tag SNP: rs3131379, r2 = 1 in people of European ancestry) [56]. Though the 6p21 locus demonstrated genome-wide association in the two cohorts assessed on the Immunochip, these variants were not identified as associated in an independent set of subjects with and without EoE assessed with the comprehensive OMNI5 array [3]. The HLA region is encoded from 6p21 and spans 3 million base pairs; the region is genetically complex with many independent haplotypes of variants in strong linkage disequilibrium [79]. The 6p21 EoE risk locus tagged by rs599707 is a highly polymorphic haplotype in the HLA that encodes 71 genes. Only 3 genotyped and no imputed genetic variants in the region reached genome-wide significance, and 2 out of 3 of these variants failed quality assessment on the OMNI5 array [3] (Supplemental Fig. 1, Supplemental Table 2). Furthermore, the genotyped variant that passed quality assessment on both the Immunochip and GWAS studies had opposite effects in the two studies, i.e., risk allele in the Immunochip study was protective in the non-overlapping cohort genotyped in the GWAS. While we have no reason to remove these variants from the analysis, two other pieces of data supporting their spurious association are the fact that the variants are not in linkage disequilibrium with each other (Supplemental Fig. 1B) and variants that are in linkage disequilibrium with the “associated” variants in the 1000 genomes project in people of European ancestry are not associated at the same level of robust significance (Supplemental Fig. 1C). Given the lack of GWAS validation and the small number of associated variants at the locus, this study presents 6p21 as a candidate risk locus for EoE that needs further study before any robustly conclusion can be established. If the rs599707 EoE risk genetic association is replicated in an independent dataset, this genotype-dependent expression of these HLA molecules should also be assessed in the context of subjects with EoE.

We have identified novel genome-wide association of EoE with variants at 16p13. This region was included in the Immunochip design based upon previous association in studies of multiple sclerosis and diabetes type 1 [23, 67, 80, 81]. Other allergic diseases also have genetic risk variants at the 16p13 locus, but it is notable that the genetic variants associated with EoE and other allergic diseases are not in linkage disequalibrium with each other (Table 3). The EoE risk variants at 16p13 are in strong LD with risk variants for multiple sclerosis and type 1 diabetes (Table 3). Among the genes at 16p13, CLEC16A is widely expressed across the immune system and contains an immunoreceptor tyrosine-based activation motif (ITAM). CLEC16A is also expressed in esophageal epithelial cells of subjects with and without EoE. Recently, CLEC16A has been shown to negatively regulate autophagy via modulating mTOR activity [82]. DEXI is named on the basis of its identified dexamethasone inducibility in airway epithelia [83]. DEXI is also differently expressed in the lung tissue of patients with emphysema compared to normal lung tissue. Genotype-dependent expression of both CLEC16A and DEXI have been identified [56, 57], and chromatin looping from the EoE risk variants shared with type 1 diabetes has demonstrated looping back to the promoter of DEXI [84, 85]. CIITA acts as main positive transcriptional regulator of the class II major histocompatibility complex genes [86,87,88]. While not expressed in esophageal epithelial cell cultures (Fig. 3b), it is found in the esophageal biopsies of patients with and without EoE (Fig. 3a), perhaps due to expression infiltrating immune cells. Further, the regulation of antigen presentation could be critical in the development of atopy. Thus, CLEC16A, DEXI, and CITTA each remain strong candidates for mediating EoE disease risk.

Altogether, this study presents a newly established EoE risk locus at 16p13 and demonstrates a relatively unique genetic etiology compared with nearly all autoimmune disease susceptibility loci.

Methods and materials

Genotyping

Genotyping was performed as previously described [3, 22] on the Illumina Immunochip genotyping array using Infinium2 chemistry. Genotypes were called using the Gentrain2 algorithm within Illumina Genome Studio.

Subjects included in the genetic analysis

The study was approved by the Institutional Review Boards at Cincinnati Children’s Hospital Medical Center (CCHMC) and all participating sites that were part of the NIH Consortium of Food Allergy Research (CoFAR) EoE Cohort (Mount Sinai Medical Center, University of North Carolina, Johns Hopkin’s University, University of Colorado Health Center/National Jewish Research Center, and Arkansas Children’s Hospital). Guardian informed consent was obtained for all participants under eighteen years of age in this study for the purpose of DNA collection and genotyping. Cases were confirmed by a physician to fulfill the diagnostic criteria for EoE. EoE is defined as peak eosinophil count ≥15 eosinophils/high-power field in esophageal biopsy sections; 30% of CCHMC and 51% of CoFAR subjects who were genotyped on the Immunochip had proton pump inhibitor (PPI) therapy before the diagnostic endoscopy. A similar strategy was used as in a previous GWAS [3]. Control subjects (non-EoE) included the subjects with self-reported European ancestry in the Cincinnati Genomic Control Cohort CCHMC (n = 641, age range 2–18 years) [89] and an external control cohort (non-EoE) acquired from the Lupus Family Registry and Repository (LFRR) in Oklahoma City, Oklahoma. The controls for the External cohort of the previous GWAS used for to further increase statistical power were acquired from a database of Genotypes and Phenotypes (dbGAP) University of Michigan study (n = 8580) [3]. In the CCHMC and CoFAR cohorts, 73% and 62% of subjects with EoE were male, respectively, and subjects with EoE had an age range of 2–52 years. The external control cohort was also used in an Immunochip analysis of Systemic Lupus Erythematosus (SLE), and none of these subjects had an SLE diagnosis [25].

Population stratification

Population stratification was performed, as previously described [3]. Ancestry informative markers were used to infer the top six principal components of genetic variation and correct for possible population stratification using Eigensoft. All local cases and controls were self-identified as having European ancestry, and principal component analysis was used to exclude subjects (n = 376) who segregated >4 standard deviations outside of the mean of the first 5 principal components (Supplemental Fig. 3). After outlier removal, there were no significant differences in the first four principal components (p < 0.1).

Genotyping quality control

Quality control on the variants from autosomal chromosomes was performed, as previously described [3]. Variants were assessed in this study if they met the following criteria: minor allele frequency greater than 1% and Hardy–Weinberg equilibrium in the controls (P < 10-4). We controlled for the presence of potential batch effects by removal of SNPs that exhibited outlier fluorescence associated with deviation between plates (P< 10−4), as per the manufacturer’s recommendation. The final genotyping rate for all SNPs was 96.3%. After applying the above filters, genotypes from 79,405 autosomal SNPs in 1210 subjects with EoE and 3734 subjects without EoE were used in the final analyses (Supplemental Table 1).

Genetic association analysis and imputation to the 1000 genomes reference panel

Association analyses were performed in PLINKv1.9 and SNPTESTv2.5.2 [90]. To detect associated variants that were not directly genotyped, highly associated regions were imputed with IMPUTE2 and used a composite imputation reference panel of integrated haplotypes from the 1000 Genomes Project sequence data freezes from August 2012 [91, 92]. Imputed genotypes were required to meet or exceed a probability threshold of 0.9, an information measure of >0.5, and the same quality-control criteria threshold described for the genotyped non-autosomal markers. Genome-wide significance was set at p values ≤5 × 10-8.

RNA sequencing

Esophageal biopsy RNA was isolated from subjects with active EoE disease and unaffected controls and RNA from EPC2 esophageal epithelial cells grown in an air–liquid interface, as previously described [3, 22, 93]. RNA sequencing acquiring 50 million mappable 125 base-pair reads from paired-end libraries was performed at the Genetic Variation and Gene Discovery Core Facility at CCHMC. Data were aligned to the GrCh37 build of the human genome using the Ensembl [94] annotations as a guide for TopHat [95]. Expression analysis was performed using DESeq2 in BioWardrobe [96, 97]. The expression studies were well powered to identify 2-fold differences in gene expression (β = 1.0 for 2-fold changes with α of 0.05 and variance of 30% in biopsy data (Fig. 3a) and 10% variance in the in vitro cell line data (Fig. 2b). Datasets are deposited in NCBI GEO: GDS3223 and GSE58640.