Genetic control of CCL24, POR, and IL23R contributes to the pathogenesis of sarcoidosis

Sarcoidosis is a genetically complex systemic inflammatory disease that affects multiple organs. We present a GWAS of a Japanese cohort (700 sarcoidosis cases and 886 controls) with replication in independent samples from Japan (931 cases and 1,042 controls) and the Czech Republic (265 cases and 264 controls). We identified three loci outside the HLA complex, CCL24, STYXL1-SRRM3, and C1orf141-IL23R, which showed genome-wide significant associations (P < 5.0 × 10−8) with sarcoidosis; CCL24 and STYXL1-SRRM3 were novel. The disease-risk alleles in CCL24 and IL23R were associated with reduced CCL24 and IL23R expression, respectively. The disease-risk allele in STYXL1-SRRM3 was associated with elevated POR expression. These results suggest that genetic control of CCL24, POR, and IL23R expression contribute to the pathogenesis of sarcoidosis. We speculate that the CCL24 risk allele might be involved in a polarized Th1 response in sarcoidosis, and that POR and IL23R risk alleles may lead to diminished host defense against sarcoidosis pathogens.


Results
GWAS discovery stage. We analyzed 530,466 autosomal SNPs from 685 patients and 847 controls after performing sample and SNP quality control procedures (see "Genome-wide association study and quality control procedures" section). The genomic inflation factor (λ) values were 1.060 and 1.056, with and without HLA region SNPs, respectively (Supplementary Fig. 1). This result indicated that population stratification had a negligible effect. A principal component analysis showed that the cases and controls in the GWAS discovery stage were genetically well matched ( Supplementary Fig. 2).
The HLA class II region displayed the most significant association with sarcoidosis (rs9274741; the P-value was corrected for the first five ancestry principal components and genomic inflation; [P GC ] = 2.5 × 10 −16 , odds ratio [OR] = 2.34, 95% confidence interval [CI] = 1.90-2.88). We identified 17 SNPs in the HLA complex that reached the genome-wide significance threshold in or near the following four genes: NOTCH4, BTNL2, HLA-DQB1, and HLA-DQA2 ( Fig. 1; Supplementary Table 1). A stepwise regression analysis showed that variants of these four gene regions were independently associated with sarcoidosis (Supplementary Table 1). Of the HLA alleles, only HLA-DRB1*0803 showed genome-wide significance as a risk factor for sarcoidosis (P GC = 4.1 × 10 −8 , OR = 1.82, 95% CI = 1. 46-2.26); HLA-DQB1*0601 showed the next highest significance as a risk allele for sarcoidosis (P GC = 1.5 × 10 −7 , OR = 1.32, 95% CI = 1.12-1.56). In contrast, HLA-B*0702, HLA-DRB1*0101, and HLA-DQB1*0501 exhibited genome-wide significance as protective alleles (Supplementary Table 2). These associations between HLA class II alleles and sarcoidosis were consistent with previous findings in Japanese populations 3,4 . Indeed, the association between HLA-B*0702 and sarcoidosis was first identified in a Japanese population; in contrast, HLA-B*07 was associated with increased risk of sarcoidosis in a Swedish population 5 . Significant SNPs in the 6p21.32-6p21.33 region that included these genes were not analyzed in our replication study, because it had been well-established that sarcoidosis was strongly associated with that genomic region.
In the C1orf141-IL23R locus, the SNPs, rs117633859 and rs117282985, located~4 kb upstream of IL23R, showed the most significant association in the meta-analysis across all three cohorts (P GC = 2.0 × 10 −12 , OR = 1.87, 95% CI = 1.57-2.23), with low heterogeneity (I 2 = 16.8%, P = 0.30). These two variants were strongly associated with sarcoidosis in the two Japanese cohorts (P < 1.0 × 10 −5 ). Although they did not attain significance in the Czech cohort, the direction of the effect of these risk alleles was consistent between the Japanese (OR = 1.88-2.01) and Czech cohorts (OR = 1.21; Table 1 and Fig. 2c). The IL23R rs11209026 (Arg381Glu) was previously reported to be associated with chronic sarcoidosis in a European cohort 11 . Another study with mixed ethnicities (including mainly populations of European descent) reported that IL23R rs11209026 and rs11465804 were associated with sarcoid uveitis, and that IL23R rs7517847 was associated with sarcoidosis without uveitis 12 . In the current study, these three SNPs were not significantly associated with sarcoidosis in our Japanese or Czech cohorts; also, rs11465804 and rs11209026 were monomorphic in the Japanese cohort (Supplementary Table 3). Furthermore, these three SNPs were not in LD with rs117633859 in either population (Japanese: D′ ≤ 0.33, r 2 ≤ 0.01; Czech: D′ ≤ 0.37, r 2 ≤ 0.03; Supplementary Fig. 4). The recent Immunochip study showed that rs12069782 in C1orf141 was strongly associated with sarcoidosis in German and Swedish populations, but not in Czech or African-American populations 10 . Our study found that rs12069782 had the second strongest association with disease in the locus of our Japanese cohorts (P GC = 1.1 × 10 −11 , OR = 1.89, 95% CI = 1.56-2.25 across two Japanese cohorts), but this SNP was not significantly associated with disease in our Czech cohort (Supplementary Table 3), which included most of the Czech individuals used in the Immunochip study. Moreover, rs12069782 was in strong LD with rs117633859 in the Japanese cohort, but not in the Czech cohort (Japanese: D′ = 0.97, r 2 = 0.93; Czech: D′ = 1.00, r 2 = 0.10; Supplementary Fig. 4). In our Czech cohort, the intronic SNPs, rs6664119 and rs11209013, had the lowest P-values in the locus for association with disease (P = 0.0048, OR = 1.42, 95% CI = 1.11-1.82) (P GC = 4.5 × 10 −7 , OR = 1.26, 95% CI = 1.15-1.38 across all three cohorts; Table 1). rs6664119 and rs11209013 were more strongly linked to rs3762318 and rs12069782 than with any other SNPs, including rs117633859, in the Czech cohort ( Supplementary Fig. 4).
To identify the most plausible SNPs within the identified loci, we performed a fine-mapping analysis with FINEMAP 13 , based on the results from the GWAS discovery cohort ( Fig. 2; Supplementary Data 2). Fine-mapping of CCL24 indicated that the rs4728493 had the highest posterior inclusion probability (0.1179) of being the causal variant at the locus. At STYXL1-SRRM3, rs112463197 and rs112680895 had the highest posterior inclusion probability (0.0649). At C1orf141-IL23R, rs117633859 and rs117282985 had the highest posterior inclusion probability (0.0765), which was nearly threefold larger than the secondhighest posterior inclusion probability (0.0243 for 17129680). eQTL analysis of CCL24, STYXL1-SRRM3, and IL23R. The disease-associated SNPs in the CCL24, STYXL1-SRRM3, and IL23R loci were located in non-coding regions. Because noncoding variants can significantly affect the expression of genes, we performed an eQTL analysis of CCL24, STYXL1-SRRM3, and IL23R (Supplementary Table 4). Our eQTL analysis revealed that the disease-risk alleles in CCL24 (the C allele of rs4728493) and IL23R (the G allele of rs117633859 and the C allele of rs6664119) were significantly associated with reduced expression of CCL24 and IL23R, respectively, in whole blood. Publicly available data from the GTEx Portal 14 showed a significant association between reduced CCL24 expression and the risk allele of CCL24 in cultured fibroblasts, ovary, tibial nerves, and skin (P multi-tissue meta-analysis = 1.1 × 10 −19 in a meta-analysis of tissues/ organs tested in the GTEx Portal; Supplementary Fig. 5). For STYXL1-SRRM3, both our eQTL analysis and the GTEx Portal showed that the disease-risk allele (the T allele of rs112463197) was significantly associated with increased POR expression. Moreover, the GTEx Portal demonstrated that the T allele had an eQTL effect on POR in many tissues/organs (P multi-tissue meta-analysis = 5.6 × 10 −184 ; Supplementary Fig. 6).
Stratified analysis by CXR stage and Löfgren's syndrome. Table 2 shows the results of the association analysis of the disease-risk alleles in CCL24, STYXL1-SRRM3, and IL23R after stratifying according to the chest X-ray (CXR) stage and the presence of Löfgren's syndrome. The OR of the CCL24 rs4728493 C allele significantly declined according to an incline in CXR stages: 0-I, II, and III-IV, in both the Japanese and Czech cohorts (Mantel-extension test: P = 0.045). In the Czech cohort, the C allele had the highest OR (3.11, 95% CI = 0.94-10.32) in patients with Löfgren's syndrome. We also found that the STYXL1-SRRM3 rs112463197 T allele had the highest OR (2.08, 95% CI = 1.19-3.64) in patients with Löfgren's syndrome in the Czech cohort, and it had a higher OR in patients with CXR stages 0-II than in patients with stages III-IV in both the Japanese and Czech cohorts, although the trend was not significant. On the other hand, the ORs of the risk alleles in IL23R were not affected by the presence of Löfgren's syndrome or CXR stages.

Discussion
To the best of our knowledge, this was the first GWAS in sarcoidosis performed in an East-Asian, Japanese population. We identified three loci, CCL24, STYXL1-SRRM3, and C1orf141-IL23R, which showed genome-wide significant associations (P GC < 5.0 × 10 −8 ) with the disease in a meta-analysis of the GWAS and replication cohorts, which included a European, Czech population. Importantly, two of the three identified loci, CCL24 and STYXL1-SRRM3, were novel in the context of sarcoidosis. Moreover, all three loci showed functional relevance; the CCL24 and IL23R risk variants were associated with reduced CCL24 and IL23R expression, respectively; and the STYXL1-SRRM3 risk variant was associated with elevated POR expression.
By interacting with CCR3, CCL24 (C-C motif chemokine ligand 24, also known as eotaxin-2) is an important mediator in Th2 cell-mediated allergic inflammation occurring in asthma, allergic rhinitis, and atopic dermatitis. However, a highly polarized Th1 cytokine profile at disease sites is a typical immunological feature of sarcoidosis 16 . Indeed, Th1 cytokines, such as IFNγ, TNF-α, IL-2, IL-12, and IL-18, have important roles in sarcoid granuloma formation 1,17 . The current study was the first to describe an association between CCL24 variants and sarcoidosis. We identified an intronic SNP, rs4728493, in the CCL24 region. The risk allele (C) of rs4728493 was significantly associated with reduced CCL24 expression. Reduced CCL24 expression would downregulate the Th2 response; thus, our findings suggested that the C allele may contribute to a Th1-biased immune response.
The Th1/Th2 balance has been shown to shift, during the course of sarcoidosis, toward an immune response characterized by Th2 cytokines 1,16 . This shift suggested that the transition from Th1 to Th2 might have a critical role in the development of the persistent form of the disease, which progresses to fibrosis. Bronchoalveolar fluid samples from patients with stage III disease (based on CXRs) showed increased CCL24 levels compared to samples from patients with Löfgren's syndrome, which is a subtype of acute sarcoidosis that has a better prognosis than other disease subtypes 18 . Recent studies suggested that the Th2 response and CCL24 were most likely involved in the development and persistence of the disease 19,20 . Our study found that the OR of the rs4728493 C allele declined according to an incline in CXR stages: 0-I, II, and III-IV, and that the C allele had the highest OR in patients with Löfgren's syndrome. These findings suggested that the C allele might be involved in a polarized Th1 response in the early stages of sarcoidosis and in the acute disease. Moreover, the C allele might contribute to suppressing disease (fibrosis) progression through its effect on CCL24 expression. This hypothesis was supported by the following findings: (a) CCL24 stimulated lung fibroblast proliferation 21 ; (b) blockade of CCL24 inhibited skin and lung inflammation and fibrosis 22 ; and (c) eQTL analysis demonstrated the reduced CCL24 expression associated with the C allele in cultured fibroblasts and skin.
POR (cytochrome p450 oxidoreductase) is a flavoprotein that transfers electrons from NADPH to all microsomal (type 2) cytochrome P450 enzymes, including steroidogenic P450c17, P450c21, and P450aro. Thus, POR has a direct role in steroidogenesis 23 . Mutations in POR cause diseases associated with abnormal steroidogenesis, such as congenital adrenal hyperplasia 24 and Antley-Bixler syndrome 25 ; however, the relationship between POR and sarcoidosis has not yet been elucidated. Our study found that the risk T allele of a sarcoidosis-associated SNP, rs112463197, in the STYXL1-SRRM3 locus was associated with increased POR expression in many of the organs affected by sarcoidosis, including the lungs, skin, sigmoid colon, esophagus, minor salivary gland, kidney, spleen, heart, nerves, pituitary, spinal cord, and liver. This increase could lead to the enhancement of endogenous steroidogenesis, which was reported to induce immunosuppression and result in increased susceptibility to infectious agents 26,27 . Various pathogens have been suggested to contribute to sarcoidosis development; recent studies have focused on the role of mycobacterial or propionibacterial organisms in the disease etiology 1,16 . Therefore, the T allele of rs112463197 might induce immunosuppression, due to enhanced endogenous steroid levels, and lead to diminished host defense against pathogens involved in the development of sarcoidosis. In our study, the rs112463197 T allele showed high ORs in patients with Löfgren's syndrome and in patients with CXR stages 0-II. These patients generally have a good prognosis, or a high/moderate rate of spontaneous remission 2 , compared to patients with stage III or IV, who exhibit little or no spontaneous remission 2 . The symptoms of sarcoidosis often ameliorate during pregnancy, due to the favorable effect of increased levels of endogenous steroid hormones, such as corticosteroids [28][29][30] . Therefore, the T allele might contribute to spontaneous sarcoidosis remission, due to its effect on endogenous steroid levels, but it might also be a risk factor for disease development.
IL23R (interleukin 23 receptor) encodes a subunit of the receptor for IL-23. IL-23 has been shown to stimulate Th17 cell proliferation and increase the production of inflammatory cytokines (e.g., IL-1, IL-6, IL-17, and TNFα) 31 . Because Th17 cells induce neutrophil and macrophage chemotaxis, they have important roles in host defense against extracellular pathogens and in the development of immune-mediated diseases 32 . Recent studies have identified IL23R as a susceptibility gene for several inflammatory/immune-mediated diseases [33][34][35][36][37] . The IL23R SNPs identified in Crohn's disease, ulcerative colitis, psoriasis, ankylosing spondylitis, and Behçet's disease were located in different The G allele of rs117633859 and the C allele of rs6664119 are risk alleles for sarcoidosis in the Japanese and Czech populations, respectively. c Patients that had sarcoidosis for more than 5 years were selected for this analysis to ensure an established CXR stage. No patient had Löfgren's syndrome in the Japanese population.
LD blocks from the LD blocks of the sarcoidosis-associated SNP (rs117633859) identified in this study (Supplementary Fig. 4). Moreover, those IL23R SNPs were not associated with sarcoidosis in our Japanese or Czech cohorts (Supplementary Table 3). Only one SNP associated with Crohn's disease and ulcerative colitis, rs76418789 (IL23R Gly149Arg), was significantly associated with sarcoidosis (P < 0.05) in both our Japanese discovery and replication cohorts. However, the ORs indicated opposite associations (0.64 vs. 1.42) in the two Japanese cohorts (Supplementary  Table 3). Interestingly, in our Japanese population, one of the SNPs we identified was previously associated with Vogt-Koyanagi-Harada (VKH) disease in Han Chinese patients 37 . This SNP, rs117633859, showed the strongest association with both sarcoidosis and VKH disease 37 of all the SNPs in the IL23R locus. Moreover, the G allele, which was associated with reduced IL23R expression, conferred risk for both diseases. Therefore, the rs117633859 G allele might downregulate IL23R expression, and carriers might have a diminished Th17 response, and thus, a diminished host defense against pathogens. This mechanism might explain the link between rs117633859 and sarcoidosis development. Indeed, stimulation with Propionibacterium acnes elicited significantly lower IL17 expression in peripheral blood mononuclear cells from patients with sarcoidosis than in those from healthy controls. Conversely, the expression levels of Th1 cytokine genes, IL2 and IL12, were significantly higher in patients than in controls 38 . Those findings suggested that sarcoidosis might arise from an imbalance in Th1/ Th17 immune responses to sarcoidosis-associated pathogens.
A limitation of our study was that only the identified loci were explored with imputation; thus, the whole genome data were not imputed. Therefore, we may have missed some potential loci that conferred susceptibility to sarcoidosis. In future studies, an imputation analysis based on our GWAS results is needed for a comprehensive clarification of the genetic factors associated with sarcoidosis. On the other hand, replication cohorts from two distinct populations and of sizes quite substantial in context of sarcoidosis represent strengths of our study.
Our study revealed substantial differences in the allele distribution of numerous SNPs between Asian Japanese and Caucasian Czech controls, even for the CCL24 lead SNP associated with sarcoidosis (Table 1). This finding indirectly supported the concept that differences in genetic background can determine ethnic differences in sarcoidosis prevalence and phenotypes. To our knowledge, this study was the first GWAS of sarcoidosis in an Asian population. We found three genome-wide significant non-HLA loci, CCL24, STYXL1-SRRM3, and C1orf141-IL23R, that conferred susceptibility to sarcoidosis. Our findings suggest that genetic control, through genetic polymorphisms in CCL24, POR, and IL23R, may have an important physiological role in the development and progression of sarcoidosis.

Methods
Subjects. A total of 700 unrelated patients with sarcoidosis and 886 unrelated healthy controls, all of Japanese descent, were enrolled in the GWAS. For replication, we included additional samples from Japan (931 cases and 1,042 controls) and the Czech Republic (265 cases and 264 controls; all Caucasians from Central Europe). Sarcoidosis was diagnosed in Japanese patients according to the "Diagnostic Criteria and Guidelines for Sarcoidosis" developed by the Japanese Society of Sarcoidosis and Other Granulomatous Disorders (JSSOG 2007) 39  Hospital, Sapporo Medical University, and Health Sciences University of Hokkaido. All Japanese controls were healthy volunteers unrelated to each other or to the patients. Sarcoidosis was diagnosed in Czech patients according to the criteria in the ATS/ERS/WASOG International Consensus Statement 2 , and they were recruited at a single tertiary referral center (Palacky University Hospital, Olomouc, Czech Republic). The Czech controls were healthy participants in the bone marrow donor registry, recruited from the same region as the patients. All Czech participants were Caucasians from Central Europe. The absence of lung disease in the controls was confirmed by a health questionnaire and interview, which emphasized family history and symptoms of respiratory disease. Sarcoidosis was staged according to findings on CXRs, as follows: stage 0, normal; stage I, bilateral hilar lymphadenopathy (BHL); stage II, BHL with pulmonary infiltrates; stage III, infiltrates without BHL; stage IV, pulmonary fibrosis. Stages III and IV are the most advanced forms of the disease; they are characterized by parenchymal involvement and fibrosis, with a low rate of spontaneous resolution. This study was approved by the Ethics Committee of Yokohama City University (approval number: A150122003). The study complied with the guidelines of the Declaration of Helsinki. The study details were explained to all patients and controls, and written informed consent was obtained for genetic screening.
We used the QIAamp DNA Blood Maxi Kit (QIAGEN, Hilden, Germany) to collect peripheral blood lymphocytes and extract genomic DNA from peripheral blood cells. Procedures were performed under standardized conditions to prevent variation in DNA quality.
Genome-wide association study and quality control procedures. Genotyping was performed with the Illumina HumanOmniExpress chip (727,413 SNPs) using the standard protocol recommended by Illumina (San Diego, CA, USA). SNPs were excluded, based on the following criteria: a call rate <98%; the rates of missing data were significantly different between cases and controls (P < 1.0 × 10 −6 ); an overall minor allele frequency <5%; and a significant deviation from Hardy-Weinberg equilibrium (HWE) in controls (P < 0.0001). For the samples, we set the minimum SNP call rate to 99%. In addition, cryptic relatedness between samples was estimated based on identity by descent; closely related samples with a pi-hat >0.1875 were eliminated. We also excluded samples with excessive heterozygosity (>5 standard deviations from the mean sample heterozygosity). Finally, a total of 530,466 autosomal SNPs (685 cases and 847 controls) that passed through the filters were used for subsequent statistical analyses. Quantile-quantile plots and the genomic inflation factor (λ) were used to assess the presence of systematic bias in the test statistics, due to potential population stratification. We also performed a principal component analysis to visualize the degree of genetic stratification between cases and controls.
Replication of the GWAS findings. We tested the lead SNPs from each of the GWAS-identified loci with the Illumina GoldenGate assay according to the standard protocol recommended by Illumina for replication in 1,973 Japanese and 529 Czech individuals. For samples, we set the minimum SNP call rate to 99%, and we obtained genotyping results for 1,949 Japanese and 508 Czech individuals. For a meta-analysis of the GWAS discovery and two replication cohorts, we used SNPs that had a consistent direction and magnitude of allelic effect across all three cohorts.
To highlight the potentially causal SNPs in the identified loci, we performed fine-mapping with FINEMAP v1.2 software 13 . This software uses a shotgun stochastic search algorithm. We ran FINEMAP with the assumption that there would only be one causal variant in each locus. We calculated the posterior inclusion probabilities and log 10 Bayes factor to assess the causality of each SNP within the identified loci.
Imputation analysis for HLA alleles. The genotypes of 6,181 SNPs located in the HLA region were selected from the GWAS data. For imputation, we used SNP2HLA v1.0.3 (https://www.broadinstitute.org/mpg/snp2hla/) 43 and a reference panel of 530 pan-Asian samples 44 . Imputed HLA alleles with r 2 > 0.8 were included in the association analysis. We also directly genotyped HLA-DRB1 alleles for 700 cases with Luminex reverse sequence-specific oligonucleotides and bead kits (One Lambda, Canoga Park, CA, USA). The concordance rate between imputed and direct genotyping was 96.4% for four-digit HLA-DRB1 types.
Expression analysis. Whole blood was collected from 342 healthy Japanese individuals in PAXgene Blood RNA tubes (Becton Dickinson, Heidelberg, Germany). Total RNA was extracted from whole blood with the PAXgene Blood RNA Kit (QIAGEN) according to the manufacturers' protocols. cDNA was synthesized with the SuperScrip IV VILO Master Mix (Thermo Fisher Scientific, Waltham, MA, USA).
Expression analyses for the targeted genes were performed with the TaqMan Gene Expression Assays (Applied Biosystems, Foster City, CA, USA) and the StepOnePlus Real-Time PCR System (Applied Biosystems), according to the manufacturers' protocols. Relative expression was quantified with the comparative Ct (ΔΔCt) method, and GAPDH gene expression served as an endogenous reference. Expression data were also extracted from the GTEx Portal online database (version 8, https://www.gtexportal.org/home/) 14 on March 5, 2020 and Pvalue significance threshold was applied as described in the original report 45 .
We also analyzed colocalization with eCAVIAR 15 to identify potentially causal SNPs that were responsible for in both GWAS and eQTL signals within the identified loci. For the colocalization analysis, we used each Z-score obtained from the genetic association results for the Japanese GWAS discovery data set and from the whole-blood eQTL results for the healthy Japanese data set; we also used each LD structure, based on the genotype data from the Japanese GWAS and eQTL data sets. We selected SNPs with Z-scores > 3 and >2, in the genetic association and eQTL analyses, respectively. eCAVIAR estimated the colocalization posterior probability (CLPP), which was the probability that the same variant was causal in both the GWAS and eQTL data. We considered SNPs with CLPP > 0.01 significant, as recommended in the original report 15 .
Statistics and reproducibility. The quality control procedures for GWAS, all association analyses, the stepwise regression analyses, and the eQTL data analyses were carried out with SNP & Variation Suite software (version 8.8.3, Golden Helix, Bozeman, MT, USA). Association analyses were performed with an additive genetic model in the Japanese GWAS discovery cohort (685 cases and 847 controls), and the obtained P-values were adjusted for the first five ancestry principal components and genomic inflation (P GC ) in the GWAS discovery stage to correct for the potential presence of population stratification. In the replication analysis using the Japanese (907 cases and 1042 controls) and Czech cohorts (252 cases and 256 controls), we also performed association analyses under an additive genetic model. Meta-analyses for the various populations were performed with the inverse variance-weighted fixed-effect model with the META v1.7 program 46 . A Cochran's Q test was used to assess heterogeneity across populations. The genome-wide significance threshold was set to P GC < 5 × 10 −8 . Regional association plots for the target regions were generated with LocusZoom v1.3 47 . The LD structures of the targeted regions were inferred with LocusZoom v1.3 and Haploview 4.1 software 48 .
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The majority of the data generated or analyzed during this study are included in this published article and its Supplementary Information files. The summary statistics of the GWAS are included in Supplementary Data 3. Other data supporting the findings of this study are available from the corresponding author on reasonable request.