Cytokines mapping for tissue-specific expression, eQTLs and GWAS traits

Dysregulation in cytokine production has been linked to the pathogenesis of various immune-mediated traits, in which genetic variability contributes to the etiopathogenesis. GWA studies have identified many genetic variants in or near cytokine genes, nonetheless, the translation of these findings into knowledge of functional determinants of complex traits remains a fundamental challenge. In this study we aimed at collection, analysis and interpretation of data on cytokines focused on their tissue-specific expression, eQTLs and GWAS traits. Using GO annotations, we generated a list of 314 cytokines and analyzed them with the GTEx resource. Cytokines were highly tissue-specific, 82.3% of cytokines had Tau expression metrics ≥ 0.8. In total, 3077 associations for 1760 unique SNPs in or near 244 cytokines were mapped in the NHGRI-EBI GWAS Catalog. According to the Experimental Factor Ontology resource, the largest numbers of disease associations were related to ‘Inflammatory disease’, ‘Immune system disease’ and ‘Asthma’. The GTEx-based analysis revealed that among GWAS SNPs, 1142 SNPs had eQTL effects and influenced expression levels of 999 eGenes, among them 178 cytokines. Several types of enrichment analysis showed that it was cytokines expression variability that fundamentally contributed to the molecular origins of considered immune-mediated conditions.


Results
The overall study design. The flowchart of study design is shown in Fig. 1. After building the list of genes encoding proteins referred to as cytokines, we performed genomic characterization of cytokines expression, which included the analysis of expression tissue specificity, genome-wide detection of cytokine eQTLs, and examination of the direction of eQTL effects on target gene pairs. Next, we described the phenotypic spectrum of cytokine gene associations in the NHGRI-EBI GWAS Catalog and conducted an integrative genomic investigation of cytokine SNPs represented in the NHGRI-EBI GWAS Catalog. This investigation consisted of the following steps: compiling the lists of GWAS trait-associated SNPs (index SNPs) and their LD SNPs, functional annotation of index and LD SNPs, natural selection analysis, and eQTL analysis of GWAS Catalog SNPs in cytokine genes. To identify functional effects of GWAS-identified variants by the means of eQTL analysis, we analysed the distribution of eQTLs by genomic region and disease spectrum, performed the analysis of tissue specificity of cytokine genes eQTLs for each trait in the GTEx panel, conducted ARCHS4 Tissues and Gene Ontology enrichment analyses for eQTLs' target genes, and calculated Jaccard tissue similarity indexes for cytokine eQTLs. This last item was aimed at establishing the role of cytokine eQTLs in cytokine expression tissue specificity.
The list of genes encoding proteins with cytokine and cytokine receptor activity. The list of 314 genes encoding proteins with cytokine/chemokine activity and cytokine/chemokine receptor activity (combined under the name 'cytokines') was constructed with the QuickGo tool 7 (Table 1, Supplementary Table S1). The terms cytokine activity (GO:0005125), chemokine activity (GO:0008009), cytokine receptor activity (GO:0004896) and chemokine receptor activity (GO:0004950) yielded, correspondingly, 219, 50, 70 and 25 genes. Two types of activities for corresponding proteins were identified for 51 genes. We then classified cytokines on the basis of having growth factor activity (GO:0008083). It was attributed to 68 genes, of which 67 genes encoded proteins with cytokine activity.
Tissue-specific expression of cytokines. In the GTEx V8 database 8 , we found 310 genes, among which four genes were not detected in any tissues and seven genes were expressed in a single tissue (Fig. 2a, Supplementary Table S2 for GTEx tissue abbreviations, Supplementary Table S3). Tissue-pairwise Spearman rank correlation of gene expression values showed positive correlation in expression (mean Spearman's rho = 0.79, range 0.40-0.99). The lowest levels of correlation were observed for hemic and immune-related cells and tis- www.nature.com/scientificreports/ sues (cells-EBV-transformed lymphocytes, spleen and whole blood) between themselves, and between other tissues (Fig. 2b). Using the expression specificity metric Tau 9 and tissue specificity index TSI 10 , we classified 255 (82.3%) genes as tissue-specific (Tau ≥ 0.8) (Fig. 2c), of which 110 genes had TSI ≥ 0.3. Based on these thresholds, we determined 20 tissue-specific genes with the highest levels of expression in corresponding tissues (Fig. 2d).
Genome-wide detection of cytokine eQTLs. Next, we performed the expression quantitative trait loci analysis, which revealed in total 322,020 associations for 84,904 eSNPs (SNPs reported as eQTLs) targeting cytokines (Fig. 2e, upper panel (associations), bottom panel (eSNPs)). Tissue sample volume correlated with the number of GTEx associations, number of unique eSNPs, number of target genes and number of target cytokine genes (Pearson correlation coefficient ranged from 0.88 to 0.92). Twenty seven genes appeared to be non-eGenes (their expression level was not associated with any SNP). The top five genes, which were regulated by the largest numbers of eSNPs included C1QTNF4, GHR, IL20RB, IL34 and CCR1 (Fig. 2e, bottom panel, Supplementary  Table S1). Interestingly, we found an inverse correlation between Tau and the number of eSNPs influencing the corresponding gene expression (Spearman's rho = − 0.390, two sided correlation P value = 1.43E−12). More in depth analysis of eQTL statistics revealed that the expression level of tissue-specific genes (Tau ≥ 0.8, TSI ≥ 0.3) was regulated by a lower number of eQTLs compared to 'other' genes (Mean ± SD: 178.81 ± 250.90 vs. 321.35 ± 376.84, Mann-Whitney U-test P = 1.49E−06, Kolmogorov-Smirnov (KS) test P = 2.80E−05). The top five genes regulated by the largest numbers of eSNPs appeared to be non tissue-specific. After the exclusion of these five genes from the sample 'other genes' , the differences remained significant (Mann-Whitney U-test P = 5.51E−06, KS test P = 7.40E−05) (Fig. 2f). We noticed that among eQTLs available for 49 tissues, eQTLs targeting corresponding top tissue-specific genes were not identified in 31 tissues. In the set of 20 top tissue-specific genes, the largest numbers of eQTLs in corresponding tissues (tissue samples, n > 350) were found for the following genes: SLURP1 (Skin-Sun Exposed/Skin-Not Sun Exposed, 684/291 eQTLs), CNTF (Nerve-Tibial, 669), CMTM2 (Testis, 58), ADIPOQ (Adipose-Subcutaneous, 45), TNFRSF11B (Thyroid, 40) and CCL11 (Colon-Sigmoid, 39 eQTLs). For other top tissue-specific genes, the number of eQTLs, if any, was small, e.g., CXCL2 (Muscle-Skeletal, 14), CSF3R (Whole Blood, 13), TNFRSF11B (Artery-Tibial, 8). These SNPs did not represent single LD blocks, average r 2 for eQTLs targeting these genes was, respectively, 0.38, 0.33 and 0.26.
Direction of eQTL effects on target gene pairs. A significant proportion of cytokines (n = 96) were located in gene clusters (Supplementary Table S4). Expression SNPs within these clusters were often associated with expression levels of more than one cytokine. We explored the direction of allelic effects of all eQTL-gene Table 1. Genes regulating proteins with cytokine activity (GO:0005125), chemokine activity (GO:0008009), cytokine receptor activity (GO:0004896) and chemokine receptor activity (GO:0004950). *GO:0008083, growth factor activity.
Phenotypic spectrum of cytokine gene associations in the NHGRI-EBI GWAS catalog. In total, 244 cytokines were mapped in the NHGRI-EBI GWAS Catalog 11 (Supplementary Table S5). Diseases and traits associated with cytokine genes were classified with the use of Experimental Factor Ontology (EFO) 12 . GWAS traits were annotated according to disease type, disease by anatomical system (for non-oncological diseases), and type of measurements (Supplementary Table S5, Fig. 4). The majority of the associations were described by several classification units. In the generated data sets of the most numerous categories (Fig. 4a-c) and diseases ( Fig. 4d), the numbers of associations, SNPs and genes correlated (Pearson correlation coefficient range: set 'disease type' 0.93-0.99, set 'disease by anatomical system' 0.84-0.97, set 'measurements' 0.93-0.95, set 'diseases' 0.70-0.82). Based on the number of associations, we constructed one more set including 20 genes (Fig. 4e). In this set, the compared parameters strongly and disproportionately varied. The highest numbers of associations covered by shared classifications were found for those related to inflammatory diseases and connective tissue diseases (Fig. 4a); immune and digestive system diseases, as well as immune and musculoskeletal system diseases (Fig. 4b); and protein measurements and inflammatory biomarker measurements (Fig. 4c). In the set of the top ten diseases, the comparison of disease pairs Crohn's disease-Ulcerative colitis and Crohn's disease (or Ulcerative colitis)-Psoriasis produced a similar number of shared associations (Fig. 4d). Infectious diseases were mainly represented by chronic infections. Only one study reported associations for acute infectious diseases 13 .

The lists of GWAS trait-associated SNPs and their LD SNPs. A total of 3077 associations for 1760
unique SNPs were found for cytokine genes in the NHGRI-EBI GWAS Catalog. Constructing a data set of cytokine SNPs, we included intragenic or intergenic SNPs in mapped (not reported) genes (Supplementary  Table S5). We identified 891 intragenic SNPs and 869 intergenic SNPs, among the latter, 249 SNPs were located in regulatory regions. The majority of SNPs were obtained from Europeans. Next, we conducted an LD analy-  Fig. 5a,b. Significant discrepancies in the proportions of index and LD SNPs were observed between EUR and ASN (P = 0.004). Other comparisons did not yield significant results due to smaller differences and/or smaller sample sizes.

Functional annotation of index and LD SNPs. To compare functional characteristics of index and LD
SNPs, we used IW-Scoring: an integrative weighted scoring framework to annotate and prioritize noncoding variations 14 (Fig. 5c  www.nature.com/scientificreports/ cutoff values. A total of 75 SNPs had global Fst rank scores > 2 (scores ranged from 0.404 to 0.668). Among them, the majority of GWAS associations, mostly with anthropometric measurements, were found for the two SNPs rs143384 and rs224333 in high LD (r 2 = 0.93 in all populations) in the GDF5 gene (Fig. 6a). Similar effects were also observed for GDF5 rs143384 and rs224333 (scores ranged from 0.649 to 0.714) when comparing Fst CEU (Northern European) vs. YRI (West African). Five SNPs in high LD (r 2 = 0.90 in all populations) in or near the TGFB2 gene showed significant Fst values for the CEU-CHB (East Asian) pair (Supplementary Table S8). Only ten SNPs had rank scores > 2 for the iHS CEU score; among them three tightly linked SNPs (r 2 = 0.86) were located in or near IL18R1 gene (Fig. 6b). The top IL18R1 SNP rs2001461 with an iHS CEU score of 4.544 was associated with blood protein (IL18R1) measurement, while two other SNPs were associated with serum ST2 (the IL1RL1 gene product) measurement (rs1420103) and atopic eczema (rs6419573). Some other IL1RL1/ IL18R1 SNPs, which were not reported in GWAS Catalog studies also had high scores (Fig. 6b), however, this could be the result of hitchhiking effects 18 Table S9). Compared to SNPs without eQTL effects (non-eSNPs), eSNPs were more often located in intergenic regulatory regions and in introns, while non-eSNPs were more frequently found in non-regulatory intergenic regions (Fig. 7a, upper panel). The most associations in the GTEx database, per one eSNP, were found for splice region variants, TF (transcription factor) binding site variants and 5′UTR variants (Fig. 7a, middle panel).
In the set of eSNPs, positive IW-score (K10) mean values were revealed for TF binding site variants, synonymous variants, 5′UTR and 3′UTR variants (Fig. 7a, bottom panel). In the context of the direction of eQTL effects on target gene pairs, quite a lot of these effects were unidirectional for one gene pair and bidirectional for another gene pair. Both uni-and bidirectional associations were found for 78 eQTLs from, respectively, 188 and 103 eQTLs with unidirectional and bidirectional effects on target gene pairs. The Circos plot demonstrating the numbers of GWAS Catalog associations, unique GWAS Catalog SNPs, unique eSNPs, eSNP associations in the GTEx v.8 database and target genes, depicted according to chromosome regions is provided in Fig. 7b. The largest number of eSNPs was reported for the region 2q14.1. For this region, we found 220 GWAS Catalog associations and 188 eSNPs targeting 27 genes, among them nine cytokines (IL36RN, IL1A, IL37, IL1B, IL1F10, IL36A, IL36G, IL36B, IL1RN). The majority (202/213) of eSNP associations were detected for different types of measurements, primarily, for interleukin-1 beta measurement. The largest numbers of eSNP disease associations were linked to the following regions: 2q12.1 (58 associations, 32 eSNPs), 10p15.1 (31 associations, 12 eSNPs), 1q21.3 and 5q22.1 (27 associations, 11 eSNPs and 5 eSNPs, respectively). The top five diseases associated with all GWAS Catalog eSNPs were: asthma, Crohn's disease, inflammatory bowel disease, multiple sclerosis and rheumatoid arthritis (Fig. 7c).
Next, we looked at the specificity of cytokine genes eQTLs for each trait in the GTEx panel (Supplementary  Table S10). The most significant associations (Pexp < E−10) were found for the following disease-tissue pairs: (1) cardiovascular, IL6R (coronary artery disease, atrial fibrillation, abdominal aortic aneurysm) and BMP1 (coronary artery disease); (2) digestive, CCR1 (celiac disease), TSLP (eosinophilic esophagitis) and IFNGR2 (ulcerative colitis, Crohn's disease, sclerosing cholangitis, inflammatory bowel disease); (3) endocrine, NRG1 (hypothyroidism), GFRA2 (type II diabetes mellitus) and IFNGR2 (sclerosing cholangitis); (4) immune/hematologic, CCL20 www.nature.com/scientificreports/ and CXCL5 (inflammatory bowel disease, ulcerative colitis), CCR1 (celiac disease, Behcet's syndrome, AIDS), GDF15 and IL12RB2 (systemic lupus erythematosus), IFNGR2 (multiple sclerosis) and TNFSF15 (Crohn's disease); (5) integumentary, IFNLR1 (psoriasis) and IL1A (eczema); (6) musculoskeletal, IFNGR2 and IFNLR1 (ankylosing spondylitis); (7) nervous, CCL20 (eating disorder), FLT3 (Tourette syndrome), IFNAR1 (narcolepsy with cataplexy), IL18R1 (anterior uveitis and leprosy); (8) respiratory, IL1RL1 and IL18R1 (asthma). The subsequent analysis of GWAS disease-relevant tissue-specific eQTLs showed that eQTLs linked to nervous system diseases were enriched (had lower expression P values) in nervous compared to integumentary tissues; and eQTLs observed in cardiovascular diseases were enriched in cardiovascular against nervous tissues (Supplementary Table S10). We further used the STRING database 19 for two types of enrichment analysis. First, we retrieved protein-protein interaction information for a total of 999 target genes, of which 655 genes encoded proteins found to be involved in protein interactions (Supplementary Table S11). The list of 655 genes was analyzed for the enriched categories from the ARCHS4 tissue database via the Enrichr tool 20 . 'Macrophage' was the top term followed by five more tissues, namely, lung (bulk tissue), colon (bulk tissue), ileum (bulk), valve and gastric epithelial cell (Fig. 7d). Since interacting proteins participate in many functions determining, in particular, tissue phenotypes in health and disease 21 , the aforementioned results are in agreement with the fact that many GWAS Catalog diseases found for cytokine genes were represented by inflammatory diseases and by respiratory and digestive system diseases. Second, we performed GO enrichment analysis for the following gene sets: target cytokines (n = 178), target non-cytokine genes (n = 821) and the combined group of 999 genes. No enriched terms were retrieved for the group of non-cytokine genes. Enriched biological process (BP) annotations included 838 and 457 annotations for the set of cytokines and for the whole set, respectively (Supplementary Table S12). Among annotations found for the whole set, 57 annotations were unique, of which the top ten annotations were related to different types of regulation, including regulation of cell communication and migration (Fig. 7e). Subsequent processing of GO annotations, with the use of the REVIGO tool 22 , allowed selection of common and unique top-level functional categories for the set of cytokines and for the whole set (Supplementary Table S13). In the latter set, non-cytokine genes might contribute to cytokine network, in particular, participating in 'apoptotic cell clearance' , since clearance defects are associated with the processes underlying inflammation and autoimmunity 23 .
Next, we used the Jaccard index to measure pairwise tissue similarity for eQTL profiles by matching eQTLs with their target genes and NES (Normalized effect size) direction. The Jaccard index was calculated for three sets: only cytokines as target genes for GWAS Catalog eSNPs, all target genes for our set of GWAS Catalog eSNPs, and cytokines as target genes at the genome-wide level (in total, targets for 84,904 eSNPs) (Supplementary Table S14). In the set of cytokines as target genes for GWAS Catalog eSNPs, the Jaccard index ranged from 0 to 42.33 (mean ± SD, 8.98 ± 8.00). It was somewhat higher when considering the whole pool of target genes (range 0.45 to 44.94, mean ± SD, 14.34 ± 6.78) (Fig. 7f). In the third set, the Jaccard index had values closer to those found for the first set (range 0.31-34.29, mean ± SD, 9.78 ± 6.15). Overall, the mean Jaccard index was low in all sets. The highest sharing of eQTLs across tissues was observed within the same tissue categories.

Discussion
Gene lists of human cytokines vary from 132 to 261 genes depending on whether cytokine receptors are included 24 . Using the QuickGo database and the search terms 'cytokine/chemokine activity' and 'cytokine/ chemokine receptor activity' , we generated a list of 314 cytokines, which comprised an extensive range of cytokines/chemokines and their receptors. We aimed at collection, analysis and interpretation of data on cytokines focused on their tissue-specific expression, eQTLs and GWAS diseases and traits.
Our results on high tissue-specificity of cytokines are in agreement with the literature data. Inflammation initiation and resolution are mediated by pathways involving different cytokines, which work together with other tissue-specific signals depending on the composition of the relevant tissue and the microbial load 24,25 . The results of inverse correlation between cytokine tissue-specificity and the number of eQTLs targeting their expression should be assessed with caution since there is a positive correlation between the number of eQTLs and the tissue sample size, as well as gene length and the number of SNPs in tight LD with the top eSNP. Nevertheless, gene expression is a trait that is often under stabilizing selection, which plays an important role in limiting discrepancy in gene-expression levels 26,27 substantial for the maintenance of tissue specificity in the expression regulatory framework 28 . Moreover, eQTLs can be predominantly targets of negative selection, in particular those affecting genes essential for tissue function, i.e. tissue-specific genes 29,30 . In this context, our results on a smaller number of eQTLs in tissue-specific cytokine genes, in comparison with other genes, seem biologically plausible. We also demonstrated many uni-and bidirectional changes in expression levels of target cytokine pairs associated with the same SNPs. A dominance of unidirectional correlations was seen in large gene clusters, mini-clusters and individual gene pairs. Gene clusters are regions of co-localized genes, which were formed in the course of evolution due to duplication of a single gene. The two newly formed copies usually developed specialized functions without losing a common primary function 31 . Cytokines gene clustering was intended to provide a consistent response to inflammatory stimuli 32 , while complex interplay of eQTLs influencing expression of genes in both directions could enable fine-tuning of the inflammatory response.
We classified all GWAS diseases and traits linked to cytokine SNPs with the EFO classification, described the top classification units for mapped traits and demonstrated their pairwise overlappings. The disease spectrum mainly included chronic immune-related disorders with a wide representation of autoimmune diseases, while associations found for infectious diseases, especially acute conditions, were relatively scarce.
It is accepted that GWAS-identified SNPs are usually considered as markers, and other SNPs in high LD with the index SNPs may be causal for the disease 33 . The comparative analysis of index SNPs and LD SNPs revealed higher functional scores for index SNPs, however, the influence of individual non-GWAS functional Scientific RepoRtS | (2020) 10:14740 | https://doi.org/10.1038/s41598-020-71018-6 www.nature.com/scientificreports/ SNPs in different degrees of LD with the index SNPs can be essential. We reported a larger proportion of LD SNPs in Asians vs. Europeans. This finding may be explained by the fact that GWAS SNPs were more often located in intragenic regions in Asian (52.21%) than in European populations (46.27%), since an excess of SNPs in strong LD is an inherent property of intragenic SNPs 34 . Immune-related genes and cytokines, in particular, are frequently targets for natural selection in humans 35 , therefore, we performed two natural selection tests, Fst and iHS, and found several SNPs subjected to high selective pressures. Given the results of the Fst test, the SNPs in or near the GDF5 and TGFB2 genes were mainly associated with anthropometric measurements related to height and BMI. Height is one of the best known candidates for polygenic selection in humans, especially in Europeans, while data for BMI are contradictory 36,37 . The GDF5 gene product regulates bone and cartilage formation; recent selection of growth phenotypes affected GDF5 alleles, which were also associated with an increased arthritis susceptibility, especially in East Asians 38,39 . In our study, the GDF5 SNPs, rs143384 and rs224333, were associated with height-and BMI-related phenotypes in different populations, however, no associations with arthritis were revealed for these SNPs under selective pressure. The TGFB2 gene product also has growth factor activity. According to GO annotations, it participates in skeletal system development. In our study, significant Fst results were observed only for the CEU-CHB pair of populations and the SNPs under selective pressure had almost no effect on the TGFB2 expression. Thus far, we did not find literature data on the involvement of TGFB2 SNPs in the selective processes. The iHS test, aimed at defining evidence of recent positive selection, detected selective sweeps for three tightly linked SNPs in the IL18R1 gene, which encodes a cytokine receptor from the interleukin 1 receptor family. It has been previously discussed that the conserved across evolutionary branches mechanism of regulating IL18 signaling might represent a target for selective pressure 40 . This assumption is consistent with the fact that the top IL18R1 SNP rs2001461 (iHS CEU score 4.54) was associated with the IL18R1 expression (GWAS trait P value = 3.00E−129).
Our eQTL analysis of GWAS SNPs revealed the largest number of associations in the GTEx database for splice region variants, TF binding site variants and 5′UTR variants, i.e., the regions functionally relevant to gene expression and its regulation 41 . The analysis of tissue specificity of cytokine genes eQTLs for each trait in the GTEx panel was aimed at highlighting the most significant findings for disease-tissue associations. Some eQTLs influenced the expression levels of many (more than 20) different cis-genes in multiple tissues thus complicating the eQTL data interpretation. The relevance of eQTLs may be supported by the results of the tissue-and disease-specific enrichment analysis 42 , however, the specific level of enrichment was observed in only two sets of comparisons. Lack of enrichment results for the majority of tissue-specific effects of eQTLs can be explained by a lack of statistical power and pleotropic effects of many SNPs. The true absence of tissue-specific effects for some complex traits is also discussed 43 . The role of eGenes represented by cytokines in comparison with the role of other eGenes was highlighted in the gene set enrichment analyses, which showed that cytokines were involved in infection and inflammation-related biological processes, while other genes in the whole set of target genes were mainly engaged in regulation, cell communication and migration. All together these data imply that cytokines expression variability fundamentally contribute to the molecular origins of complex traits and immune-mediated diseases. Tissue similarity in eQTL profiles of GWAS trait-associated SNPs measured by Jaccard coefficients showed high eQTL specificity. These results are in agreement with the fact that tissue-specific eGenes are more often annotated as disease genes than tissue-shared eGenes 42 . Tissue-specific genes are relevant to tissue biology and disease 29 . Among other regulatory mechanisms, cytokine eQTLs specificity could contribute to cytokine tissue expression specificity. The GTEx database provides data on tissue expression in healthy tissues, however, the validity of the approach using GTEx data for translational research in medical science has been recently confirmed in the study of drug targets, in which druggable genes were expressed in disease-relevant tissues in a healthy state in 87% of cases 44 .
The main limitations of this study are characteristic for secondary investigations using data as they are in original resources. The enrichment analysis of GTEx data was limited to the results presented with q-value threshold 0.05. GTEx eQTL effects may be gender-and age-dependent and linked to population structure, thus being subjected to confounding 7,45 .
In conclusion, we generated a list of 314 cytokines and characterized their tissue-expression specificity, eQTLs and GWAS diseases and traits. Several findings go beyond the scope of this study and may be interesting for future research directions. (1) Correlation between cytokine gene expression levels in different GTEx tissues was high, however, the lowest levels of correlations were revealed for whole blood and other hemic and immune-related cells and tissues, between themselves and between other tissues. These data are in agreement with GTEx data for the whole set of GTEx genes 7 and suggest that using blood as a surrogate tissue for transcription analysis has marked limitations for translational research. (2) Low Jaccard index for eQTL-based tissue similarity reflects eQTL tissue-specificity. This conclusion is supported by literature data demonstrating that, if possible, diseaserelevant tissue should be used for eQTL-based transcription analysis 46 . Other findings and observations are more specific to the aim of the study. (3) Acute immune-mediated conditions are scarcely represented among GWAS traits, possibly due to insufficient research and/or complexity and multifactoriality. (4) Natural selection analysis identified SNPs in the GDF5 gene (confirmatory information) and IL18R1 gene (new data) subjected to positive selection. (5) GWAS SNPs with eQTL effects affected expression levels of many eGenes in different tissues, however, it was cytokines expression variability that fundamentally contributed to the molecular origins of considered immune-mediated conditions.

Materials and methods
Hand-curated list of genes encoding proteins with cytokine and cytokine receptor activity. We generated a list of genes encoding proteins with cytokine/chemokine and cytokine/chemokine receptor activity employing the QuickGo database-a web-based tool of the European Bioinformatics Insti- www.nature.com/scientificreports/ tute (EMBL-EBI) for Gene Ontology searching 7 . Our searching for the terms cytokine activity (GO:0005125), chemokine activity (GO:0008009), cytokine receptor activity (GO:0004896) and chemokine receptor activity (GO:0004950) provided 40,607, 11,241, 29,071 and 9474 annotations respectively (last access August, 2019). After the removal of non-human taxon entries, proteins without annotations in SwissProt and entries without a gene symbol approved by the HUGO Gene Nomenclature Committee, we constructed a final set of 314 unique genes.
GTEx information on cytokine genes in the whole genome context: data extraction and analysis. For our gene-set analyses, we downloaded results from the GTEx database Analysis Release V8 8 . Gene tissue expression data presented as median TPM (Transcript Per Million) were available for 54 tissues (in total 948 donors). Cis-eQTL information was provided for 49 tissues (in total 838 donors) with significant eQTL signals determined with a Q-value threshold. Two metrics estimating tissue specificity were applied, Tau 9 and TSI 10 , both calculating tissue specificity based on the information of a given gene expression in each tissue and its maximal expression across all tissues. Both metrics vary from 0, indicating that expression is constant in all tissues, to 1, pointing out that expression is specific to a single tissue.
To assess the direction of an eQTL effect on target gene pairs, we formed clusters of physically co-localized genes for a given chromosome region. These clusters included gene pairs and SNPs matched by tissuespecific expression. LD analysis was carried out by calculating the squared correlation coefficient (r 2 ) for each pair of SNPs of interest. LD patterns were analyzed in populations of European descent (CEU, UtAh residents from North and West Europe; TSI, Toscani in Italia; FIN, Finnish in Finland; GBR, British in England and Scotland; IBS, Iberian population in Spain). LD analyses in this section and throughout the study were done with the LDlink resource 47 .
The NHGRI-EBI GWAS Catalog data extraction and analysis. We extracted data for cytokine genes from the NHGRI-EBI GWAS Catalog (last access August, 2019) 11 . Associations reported in the GWAS Catalog were annotated with the use of EFO (Experimental Factor Ontology) 12 . We used HaploReg v4 48 to construct two data sets including information for GWAS SNPs (index SNPs) and SNPs in high LD with the index SNPs. This information was obtained via haploR package 49 . LD SNPs were selected based on a threshold r 2 > 0.8 and were matched by population with index SNPs. Only index SNPs were considered for mixed and non-indicated populations, as well as for populations which could not be attributed to any of the HaploReg populations: European, Asian (Chinese, Japanese, Vietnamese), African (including African Americans) and American (Admixed American). To annotate index and LD SNPs we used the IW-scoring tool, which was developed to annotate and rank non-coding genetic variants by their putative functional importance 14 . Among available outputs, we focused on IW-score (K10), which aggregates scores from ten state-of-the-art functional prediction tools for known genetic variants. We also considered some additional annotations provided by HaploReg v4 48 and SNPnexus 50 for noncoding and coding variants (annotations at the genomic region level, as potential regulatory sequences and the effect of the amino acid change on protein function).
Detection of signals of positive selection. Two measures of positive selection signals for GWAS SNPs in cytokine genes, Fst (Fixation index) and iHS (Integrated Haplotype Score) were subjected to analysis via the 1000 Genome Selection Browser 1.0 15 . This resource includes data on populations of West African (YRI), Northern European (CEU) and East Asian (CHB) ancestry. Natural selection statistics are provided as the absolute scores and rank scores representing − log10 of the P value at 0.01 FDR for the SNP compared to others in the whole-genome context. Rank scores > 2.0 are considered significant 15 . Six sets, Fst CEU versus CHB, Fst CEU versus YRI, Fst Glob, iHS CEU, iHS CHB and iHS YRI were considered.

Functional analysis of GWAS SNPs with eQTL effect.
To investigate GWAS SNPs affecting gene expression levels, we also used the GTEx database Analysis Release V8. Among target genes, we identified gene sets significantly enriched in protein-protein interactions (PPI) and GO (Gene Ontology) terms in the STRING database 19 . For each PPI pair, the combined score > 0.4 was applied as the cutoff criterion. In the set of GO terms we included only terms with at least three genes per category. The resulting gene set was analyzed for tissue specific enrichment by ' ARCHS4 Tissues' in Enrichr 20 . For gene set enrichment analyses we set the false discovery rate (FDR) threshold as 0.05. The REVIGO tool was applied to remove redundancies in GO terms and to select the cluster GO representatives 22 . Tissue-sharedness in eQTL effects was assessed with the Jaccard similarity index, which measures similarity between two sets as the ratio of their intersection to their union. Tissue pairwise overlap was registered for eQTLs if they influenced the same target genes with the same direction of the effect.

Statistical analysis.
For categorical variables, we used Pearson's chi-square with Yates's correction/Fisher's exact test and displayed P values corrected for multiply testing (FDR test). For continuous variables, we applied two nonparametric tests, the Mann-Whitney U test and the KS tests. The Mann-Whitney test computes a P value depending on the discrepancy between the mean ranks of the compared groups, while the KS test compares the cumulative distribution of the two data sets. The Mann Whitney test is more appropriate for sample sizes < 50 samples.
Ethics statement. Ethical review and approval was not required for the secondary analysis of public data in accordance with the local legislation and institutional requirements.