Introduction

Atopic dermatitis (AD) is one of the most common chronic dermatological diseases. The prevalence of AD reported in children worldwide in 2019 was 10–20% and is increasing1,2. AD is characterized by skin lesion and pruritus, which is not life-threatening but severely affects the quality of life. It is sometimes accompanied by thyroid autoimmunities, mental health problems, and cancerous diseases with/without infectious complications3,4,5. Currently, monoclonal antibodies are used to treat severe AD, while topical steroids and antihistamines are the first-line treatment for mild-to-moderate AD6. However, long-term use of topical steroids or antihistamines can cause unwanted side-effects such as skin thinning, melanocyte inhibition, and gastrointestinal effects7,8. Therefore, alternate strategies for treating mild-to-moderate AD are necessary.

Several genetic risk factors or causal genes for AD have been identified by functional and computational studies9,10. Genetic variants associated with filaggrin (FLG), ovo-like transcriptional repressor 1 (OVOL1), and interleukin 6 receptor (IL6R) were suggested as risk loci for AD by a multi-ancestry genome-wide association study (GWAS)9. Other functional or clinical studies suggested IL-4, IL-13, toll-like receptor 2 (TLR2), matrix metalloproteinase 9 (MMP9), and MMP10 as susceptibility genes for AD10,11. However, the underlying mechanisms of AD pathogenesis have not yet been elucidated.

Since general GWAS utilizes large-scale genotype data to identify genetic variants that influence disease pathogenesis, the method is less optimized for interpreting multiple gene expression changes caused by variants in non-coding regions. Recently, transcriptome-wide association study (TWAS) was suggested as an improved approach to implement gene expression imputation using GWAS results for better interpretation12,13. TWAS predicts the gene expression levels of phenotypes by combining genotypes and gene expression weights calculated using cis-expression quantitative trait loci (eQTLs) with multiple prediction models. TWAS has provided new insights into the underlying genetic/transcriptomic mechanisms of several diseases and phenotypes, including Alzheimer’s disease, pancreatic cancer, and neutrophil development14,15,16.

We conducted TWAS using the largest up-to-date AD GWAS dataset obtained from a European population. Transcriptome meta-analysis with microarray and RNA sequencing (RNA-seq) datasets were performed to identify gene expression changes that could not be explained solely by the genetic backbone. The connectivity between gene expression signatures from TWAS and transcriptome meta-analysis was assessed by network analysis. Finally, we performed in silico drug repositioning by combining the results from TWAS and meta-analysis to identify alternative therapeutic options to treat AD. To the best of our knowledge, this is the first integrative analysis on AD to combine TWAS and meta-analysis. We believe that our results can help expand knowledge of the biological mechanisms of AD pathogenesis and the development of the therapeutic options for AD.

Results

Enrichment analysis of GWAS signals from AD GWAS summary statistics

To examine the genetic landscape of AD, this study uses the UK Biobank GWAS data consisting of 279,476 controls and 9831 AD patients. First, we examined whether the GWAS signals for AD were specifically enriched in certain tissue or cell types by using the functional mapping and annotation of genetic association (FUMA). We found that the cis-regulated genes of GWAS signals were mainly over-expressed in skin tissues (Supplementary Fig. S1)17. Next, tissue- or cell-specific heritability was analyzed using a linkage disequilibrium (LD) score regression applied to specifically expressed genes (LDSC-SEG) using the multi-tissue expression dataset and multi-tissue chromatin dataset following Finucane et al.18. Heritability of AD GWAS signals on the multi-tissue expression data showed significant enrichment (false discovery rate (FDR) < 0.05) in the blood and immune-related tissues (Supplementary Fig. S2a; Supplementary Data 1) and this pattern was replicated in the multi-tissue chromatin dataset (Supplementary Fig. S2b; Supplementary Data 2).

Transcriptome-wide associations for AD

To identify susceptibility genes for AD, we performed TWAS with functional summary-based imputation (FUSION), using eQTL panels from nine tissues that can cover the systemic features of AD. The tissue panels were skin-sun exposed, skin-not sun exposed, cells-transformed fibroblast, spleen, thyroid, whole blood, cells-Epstein–Barr virus (EBV)-transformed lymphocytes, Netherlands Twin Registry (NTR) blood, and Young Finns Study (YFS) blood panel. Among the total of 52,860 associations, 25 genes in 31 loci remained statistically significant after using a Bonferroni-corrected threshold (P < 0.05/number of associations (52,860) = ~9.46 × 10−7) (Fig. 1a, Table 1, and Supplementary Data 3). Although TWAS signals showed the highest mean effect size in the skin-not sun-exposed panel, this was not dramatically higher than the mean effect sizes of other panels, indicating that the genetic features of AD may evenly affect the gene expression levels of nine tissue panels (Supplementary Fig. S3). The numbers of significant associations were six in skin-sun exposed, five in skin-not sun exposed, five in cells-transformed fibroblast, one in spleen, seven in thyroid, eight in whole blood, one in cells-EBV-transformed lymphocytes, two in NTR blood, and three in YFS blood panel. These results may represent the tissue-specific genetic features of AD in skin functions, immunological abnormalities, and thyroid autoimmunity.

Fig. 1: Overall results from the TWAS and post-analysis.
figure 1

a A Manhattan plot showing the TWAS results obtained using the FUSION software. The red line indicates a Bonferroni-corrected threshold (P < 9.46 × 10−7), and the yellow dots correspond to the 25 TWAS-significant genes. b A heatmap showing the result of TWAS-GSEA. The color of each cell indicates the number of available genes involved in the gene set divided by the total number of the genes in the gene set. The cells marked with asterisks are the significantly enriched gene sets in the corresponding tissue panels.

Table 1 List of significantly associated genes from TWAS.

Among these genes, 18 well-known AD risk genes such as FLG, OVOL1, and IL6R were significantly associated with TWAS signals for AD, confirming the validity of our methods. We identified three non-coding RNAs significantly associated with AD (AC007278.2, AC007248.7, and RP11-85K15.2) and four novel AD genetic risk genes, leucine rich repeat and Ig domain containing 4 (LINGO4), regulatory factor X5 (RFX5), prolyl-4 hydroxylase subunit alpha 2 (P4HA2), and RNA binding motif protein 17 (RBM17), which were not identified in previous GWAS studies. Among the 25 significantly associated TWAS genes, the majority (76%), including previously reported and novel TWAS genes, remained statistically significant after the permutation test (P < 0.05), suggesting that our TWAS genes are statistically robust findings.

Then, we compared the TWAS results with two other gene prioritization methods: the multi-marker analysis of genomic annotation (MAGMA) and the COLOC method19,20. While MAGMA analyzes the associated genes based on their chromosomal positions, COLOC is an R package for analyzing colocalization events to calculate posterior probabilities (PP) for hypotheses 0–4 (H0–H4). We detected 68 genes significantly associated with AD using MAGMA by applying a Bonferroni-corrected threshold (P < 2.64 × 10−6) that overlapped with 12 genes from TWAS (Supplementary Fig. S4a). The COLOC results showed 27 colocalized signals for AD (PP3 + PP4 > 0.8 and PP4/PP3 > 2), among which more than half (15/27) were also prioritized in TWAS (Supplementary Fig. S4b, c). Among the 27 genes from COLOC, 13 overlapped with the results from MAGMA (Supplementary Fig. S4c). Nine genes were prioritized with all three methods: OVOL1, ARFRP1, PPP2R3C, FAM177A1, CLEC16A, SLC2A4RG, ZBTB46, IL6R, and IL18RAP (Supplementary Fig. S4c).

To analyze whether novel TWAS genes were jointly associated with AD, a conditional and joint analysis using FUSION was conducted with the TWAS results (Supplementary Fig. S5a–c and Table 2). Among the four novel genes, LINGO4, RFX5, and RBM17 remained jointly significant after the expected gene expressions were removed. A subsequent analysis using the fine-mapping of causal gene sets (FOCUS) was performed to determine the genetic causality of three novel jointly significant genes in AD pathogenesis. Two novel genes, LINGO4 and RBM17, were included in credible sets with significant cross-validation P-values (P < 0.05) in FOCUS and their posterior inclusion probabilities (PIPs) indicating the nominal probability of causality were calculated (Supplementary Fig. S6a, b and Table 3). LINGO4 was significantly detected in two genotype-tissue expression (GTEx) tissue panels: skin-sun exposed (PIP = 0.163) and skin-not sun exposed (PIP = 1). RBM17 was also significantly detected in the skin-sun exposed panel (PIP = 0.695).

Table 2 Conditional and joint analysis results of novel TWAS genes in FUSION.
Table 3 Fine-mapping results of the novel TWAS genes using FOCUS.

Overall TWAS signals were analyzed with TWAS-gene set enrichment analysis (TWAS-GSEA) software to determine their enriched biological pathway. Fifteen gene sets among the Gene Ontology–Biological Process (GO-BP) and Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets were significantly enriched with TWAS signals across five tissue panels: skin-sun exposed, skin-not sun exposed, YFS blood, whole blood, and cells-EBV-transformed lymphocytes (Fig. 1b and Supplementary Data 4). TWAS signals were enriched in cornified envelope and peptide cross-linking in skin panels, which are well-known representative molecular characteristics of AD. TWAS signals from YFS blood and whole blood panels were significantly enriched in cytokine production (type 1 helper T cell activation) and hedgehog signaling pathways, which supports the notion that T cell-mediated immune responses are crucial pathogenic mechanisms of AD. In addition, we identified significant enrichment in TWAS signals in immune cell differentiation and meiotic cell cycle regulation from the cells-EBV-transformed lymphocytes panel. Together, the functional annotation of TWAS signals suggested that they mostly contribute to the abnormal activation of immune responses and the development of AD skin lesions.

Transcriptome meta-analysis for AD

Due to the complicated nature of AD, there may be transcriptional changes that can be marginally explained by genetic variations. Therefore, we conducted transcriptome meta-analysis to find transcriptional changes occurred by non-genetic factors. We collected skin transcriptome datasets (control: 93; AD: 140) from five studies on five different experiment platforms from public databases (Table 4). Then, we integrated the datasets into a merged set, removing the batch effects between individual studies. Principal component analysis (PCA) was conducted to verify that major variances between samples were mainly due to disease state (Fig. 2a).

Table 4 List of the transcriptome datasets used for transcriptome meta-analysis.
Fig. 2: Correction of batch effects and identification of meta-signatures for AD.
figure 2

a A scatter plot displaying the PCA results using all genes after the batch effect correction. The shapes of the points indicate the samples from each dataset. Green and red color correspond to the healthy control samples and AD samples, respectively. b A Venn-diagram comparing the DEGs from single studies with meta-signatures. c A heatmap of expression profiles of meta-signatures across the samples.

A transcriptome meta-analysis for identifying differentially expressed genes (DEGs) between AD and control groups was conducted using the batch effect-corrected merged set. Using merging data, we obtained robust genetic features (meta-signatures) with increased statistical power. We identified 268 meta-signatures consisting of 196 up- and 72 downregulated DEGs (FDR < 0.01 and |log2fold-change (FC)|values > 1). We found that 226 genes from meta-signatures were included in at least one of the single datasets, while 42 were only identified in the meta-analysis (Fig. 2b). There was a clear distinction of gene expression profiles between the control and AD samples (Fig. 2c).

Among 268 meta-signatures, we identified five novel genes not previously reported as having associations with AD pathogenesis (Table 5). Chromosome 1 open reading frame 162 (C1orf162) was detected as a positively regulated gene and expresses a protein located in the hydrophobic region of the cellular membrane21,22. Nocturnin (NOCT) encodes a protein that is crucial in the circadian system23. The multi-functioning gene TP53-induced glycolysis regulatory phosphatase (TIGAR), known for its role in the p53/TIGAR signaling pathway, was also significantly upregulated24. There were two downregulated novel genes: scinderin (SCIN) and BOC cell adhesion associated, oncogene regulated (BOC). SCIN is associated with skin development or epithelial–mesenchymal transition, whereas BOC is involved in developmental pathways such as hedgehog pathway or neuronal differentiation25,26,27,28.

Table 5 List of the novel genes from the transcriptome meta-analysis.

We examined the expression profiles of TWAS genes in the meta-analysis results. Among the 25 TWAS genes, 16 had corresponding probes available in our merged set. Only FLG was involved in both the TWAS signal and meta-signature. Other TWAS genes, except RBM17 (FDR = 0.258, log2FC = 0.022), showed marginally significant differential expression (FDR < 0.01, |log2FC| > 0) in our meta-analysis (Supplementary Table S1). Although there was only one direct overlap between TWAS genes and meta-signatures, we observed significant correlations between the two in gene-set levels (Supplementary Fig. S7). In line with the significant enrichment of TWAS results in meta-signatures, the functional enrichment results of the meta-analysis well conformed with the TWAS-GSEA results. We found that 80% of gene sets that were significantly enriched with TWAS signals were also enriched with the pre-ranked gene lists generated using the transcriptome meta-analysis (Supplementary Data 5). Together, the meta-analysis using published transcriptome data showed the reliability of the TWAS genes and identified five novel genes.

Network construction and sub-network analysis for integrating TWAS and meta-analysis

To systematically assess the connections between TWAS genes and meta-signatures, we conducted network analysis using both sets of genes as input nodes in the search tool for the retrieval of interacting genes (STRING) database (Supplementary Fig. S8a). After constructing protein–protein interaction (PPI) networks composed of 243 nodes, we analyzed the sub-network clusters to examine the local connections between TWAS genes and meta-signatures. Networks were clustered into 12 sub-networks, and the three clusters with the top 25% rank scores were regarded as the main ones (Supplementary Fig. S8b).

Cluster 1 showed the highest rank score (score: 12.383) with 48 genes that included three TWAS genes, 44 meta-signature genes, and one gene from the STRING database (Fig. 3a). In cluster 1, marker of proliferation Ki-67 (MKI67) was the hub gene with 30 degrees and 0.254 betweenness centrality (BC). Cluster 2 contained 30 upregulated and two downregulated meta-signature genes and 11.355 rank score (Fig. 3b). The hub gene for cluster 2 was interferon regulatory factor 7 (IRF7) that presented 25 degrees and 0.272 BC. Cluster 3 had an 11.13 ranked score and consisted of the most nodes (116) with seven TWAS genes, 107 meta-signature genes, FLG (which was involved in both TWAS genes and meta-signatures), and one gene added by the STRING database. MMP9, which was an upregulated meta-signature, was the hub gene for cluster 3, showing 38 degrees and 0.192 BC (Fig. 3c). The connections between TWAS genes and meta-signatures in cluster 1 had the highest rank score and cluster 3 harbored the most genes. This suggests that the combination of TWAS genes and meta-signatures successfully expanded the genetic signatures of AD.

Fig. 3: Sub-networks of the PPI network constructed with the functional protein association retrieved from the STRING database using the TWAS genes and meta-signatures.
figure 3

The PPI network of the sub-network clustered using MCODE that were a cluster 1, b cluster 2, and c cluster 3. The size of each node is proportional to the degree of the node. The weight of each node (Z (TWAS) or meta-analysis log2FC) is indicated by the color of the node. The shape of the node indicates where the gene came from. A circle, rectangle, or triangle corresponds to genes involved in TWAS, meta-analysis, and the STRING database, respectively. Significantly associated genes in both TWAS and meta-analysis are marked with a red asterisk.

Additionally, we analyzed the connections between genes from our analyses and known AD-associated genes in functional networks specific to three tissues (blood, blood plasma, and skin) and 12 cell types (B-lymphocytes, culture condition CD8 cells, dendritic cells, eosinophils, granulocytes, keratinocytes, monocytes, mononuclear phagocytes, natural killer cells, neutrophils, skin fibroblasts, and T-lymphocytes)29. We compared the gene–gene functional connectivity of known AD markers and 289 genes from our analyses versus the connectivity of AD markers and randomly selected 289 genes. In all 15 networks, genes from our analyses showed significantly higher connectivity (P < 0.001, one-tailed Mann–Whitney) with known AD markers than random genes, suggesting their tissue- and cell-specific functional involvement in AD etiology (Supplementary Fig. S9).

Identifying potential drug candidates for AD

Using TWAS genes and meta-signatures, we discovered drug candidates for AD via a drug-repositioning approach. The connectivity map (CMAP) database contains the genome-wide transcriptional change data after the addition of small molecules (perturbagens). Enrichment scores of TWAS genes (TWAS-ES) and meta-signatures (Meta-ES) for each perturbagen were calculated using CMAP to select perturbagens with product scores >0.6 (Supplementary Data 6). Perturbagens selected as potential drug candidates were pararosaniline (TWAS-ES: 0.875; Meta-ES: 0.981; product score: 0.858), 2-deoxy-D-glucose (TWAS-ES: 0.916; Meta-ES: 0.936; product score: 0.857), cantharidin (TWAS-ES: 0.839; Meta-ES: 0.869; product score: 0.729), MG-132 (TWAS-ES: 0.683; Meta-ES: 0.984; product score: 0.672), and 1,4-chrysenequinone (TWAS-ES: 0.836; Meta-ES: 0.736; product score: 0.615) (Fig. 4a). To assess coherence between the drug lists derived from the two different sources, we analyzed the correlation between TWAS-ES and Meta-ES; those of each CMAP drug that were significantly enriched (P < 0.01) in both TWAS and meta-analysis were positively correlated (R = 0.414, P = 2.791 × 10−11), indicating that the significantly enriched drugs from TWAS and meta-analysis methods had significant coherence (Fig. 4b).

Fig. 4: Identification of potential drugs for AD through in silico drug repositioning.
figure 4

a A scatter plot of the calculated product score. Highly enriched drugs (product score > 0.6) are marked with red and annotated. b A scatter plot showing the correlation between the enrichment of perturbagens calculated with TWAS genes and meta-signatures. c The structure similarity analysis results comparing the potential drug candidates and reference drugs. The intensities of red rhombi are proportional to the cosine coefficient similarity index. d A network showing the similarities in MOAs of potential drug candidates and reference drugs. Red and blue nodes correspond to the potential drug candidates and reference drugs, respectively.

Finding structurally or functionally similar molecules to currently used drugs is a basic approach for drug repositioning. Therefore, we assessed the similarities of structures and modes of actions (MOAs) between our drug candidates and four reference drugs used to treat AD selected from three categories: tacrolimus as a topical calcineurin inhibitor, hydroxyzine and diphenhydramine as antihistamines, and cefalexin as an antibiotic30,31,32. We compared the chemical structures of our potential drug candidates and the reference drugs using the cosine coefficient (Fig. 4c). Our drug candidates showed a cosine coefficient in the range 0.222–0.544 compared with reference drugs. Cantharidin and 2-deoxy-D-glucose were similar to the reference drug tacrolimus, and MG-132 to cefalexin and diphenhydramine, suggesting their high potential as treatment options for AD.

We carried out network-based MOA analysis to investigate the similarities in the transcriptional signatures of the drug candidates and reference drugs. Each drug candidate connected with at least one reference drug either directly or with just one stopover, as shown in Fig. 4d. 2-deoxy-D-glucose was directly connected to hydroxyzine and indirectly connected to tacrolimus, which showed structural similarity with tyrphostin as a stopover33. Pararosaniline had two indirect paths via a merged gene signature from PEGylated liposomal doxorubicin (PLD+) or an actin polymerase inhibitor, cytochalasin B, connected to hydroxyzine34,35. Both 1,4-chrysenequinone and cantharidin were directly connected to the reference drug cefalexin.

We identified potential drug candidates by analyzing gene lists from TWAS and transcriptome meta-analysis with CMAP that showed substantial similarities with currently used drugs in terms of chemical structures and MOAs, suggesting their potential for ameliorating AD symptoms.

Discussion

TWAS calculates the expected gene expression values based on large-scale GWAS, of which the sample number usually exceeds those of transcriptome experiments from clinical studies. By predicting tissue-specific expression levels of AD using TWAS, we could identify four novel genes (Fig. 1a). LINGO4 is a gene encoding a protein with an Ig-like C2 type domain and 13 leucine-rich domains. A previous study indicated the association between LINGO4 and essential tremor in a Chinese population, but the contribution of LINGO4 to AD has not been revealed, to the best of our knowledge36. The gene product of RFX5 is reported to be associated with interferon gamma activation or major histocompatibility complex II gene expression, suggesting its role in AD pathogenesis37,38,39. Several studies mentioned the P4HA2 gene in AD or AD-like symptoms, but none of these reports highlighted P4HA2 as a major risk factor for AD40,41,42. The RBM17 gene encodes a protein that induces cell cycle-related biological pathways43. This gene was mentioned in previous reports but was never highlighted as a main causal genetic risk factor for AD44,45. While recent research by Sobczyk et al. utilized the GWAS summary statistics from the EAGLE Consortium, which is the largest multi-ancestry study containing the genotypes of AD patients and healthy controls of European, African, Japanese, and Latin American ancestry, we used the summary statistics of a European population from UK Biobank45. For this reason, we may have estimated genetic risk factors for AD in the European population more precisely, thereby identifying genes that were not found in the previous study.

Functional annotation of TWAS signals also conformed to known characteristics of AD pathogenesis (Fig. 1b). The most well-known genetic risk factor, FLG, is associated with the cornified envelope and peptide cross-linking, which are representative characteristics of AD and trigger skin barrier dysfunctions46,47,48. Enriched pathways in blood-related panels were related to immune responses such as the function and regulation of type 1 helper T cells, which are a signature of the transition from early- to chronic-stage AD49.

Our meta-analytic approach combined five independent transcriptome datasets from previously published studies into a merged set with adjusted batch effects (Fig. 2a). Even though transcriptome meta-analyses have been previously performed, our study used 233 samples, which is the largest sample to date50,51,52. Because statistical power improves by increasing sample size, we obtained a meta-signature showing clear expression patterns across the samples and identified five novel genes, C1orf162, NOCT, TIGAR, SCIN, and BOC, that may play crucial roles in AD pathogenesis (Fig. 2b, c). Notably, TWAS signals were enriched in hedgehog signaling, and we identified the BOC gene, which plays a role in hedgehog signaling, from the meta-analysis (Figs. 1b and 2). The pathogenetic role of hedgehog signaling in AD has received some attention in recent experimental studies, and our study also revealed the connection between AD etiology and the abnormal activation of this signaling pathway.

TWAS has advantages in its sample size and statistical power for detecting genetic risk factors and their associated genes, whereas transcriptome studies measure expression values. We believe that integrating these two approaches complements what each method lacks. TWAS genes and meta-signature genes of AD were connected in two major sub-networks on the PPI network, suggesting that these gene connections may relate to AD pathogenesis (Fig. 3a–c).

We calculated product score using TWAS genes and meta-signature as inputs and identified five potential drugs for AD: pararosaniline, 2-deoxy-D-glucose, cantharidin, MG-132, and 1,4-chrysenequinone (Fig. 4a–d). Pararosaniline, 2-deoxy-D-glucose, cantharidin, and their derivatives had in vivo and/or clinical evidence of ameliorating various dermatological conditions53,54. MG-132 and 1,4-chrysenequinone inversed the gene expression patterns of AD in our in silico approach. Pararosaniline is an organic compound used as a fixation dye for frozen tissues or for the detection of aldehydes in biological materials55. Gentian violet, a hexamethyl form of pararosaniline, was previously used as an antibiotic, but has recently received attention for its potential to treat dermatologic diseases such as hypereosinophilic syndrome and pachyonychia congenita52. The glucose derivative 2-deoxy-D-glucose is used as an imaging agent for in vivo fluorescence imaging and has been implicated in targeted cancer therapies56,57. It also significantly ameliorates skin inflammation in dermatitis mouse models53. Cantharidin is a natural terpenoid compound produced in blister beetles, which were used in ancient Asia to treat conditions such as arthritis, pneumonia, ulcers, and smallpox54. Recent studies used cantharidin to manage dermatologic diseases like molluscum contagiosum and warts58,59. MG-132 is a proteasome inhibitor with anti-cancer activities that can also temporally alleviate AD-like symptoms in a murine model60,61. 1,4-chrysenequinone, a para-quinone antioxidant is associated with antigen presenting and processing62,63,64. Several studies have suggested 1,4-chrysenequinone as a therapeutic agent for cancerous diseases65,66. While our drug candidates showed moderate structural similarity with known AD drugs (0.222 < cosine coefficient < 0.544), we observed suggestive similarities in MOAs.

We combined two powerful approaches, TWAS and transcriptome meta-analysis, to investigate the complicated biological nature of AD and identified potential therapeutics through in silico drug repositioning (Fig. 5). We identified novel genetic factors associated with AD risk and/or pathogenesis, which have roles in skin barrier abnormality, immune cell dysregulation, cell cycles, and immune responses, through an integrative transcriptome approach. Because we used an in silico approach, our results may need to be validated with experimentation. While animal models for AD are available, they are imperfect representations of human AD and only have an AD-like phenotype67,68. Transcriptomic profiles of each murine model with AD-like phenotypes showed significant differences from human AD, indicating that our genetic markers need to be validated in human patients69. However, since our drug candidates are associated with ameliorating the symptoms of AD, the effectiveness could be validated using in vitro and in vivo models. We believe that our systematic large-scale analysis will expand the understanding of the biological phenomena underlying AD in humans.

Fig. 5: A schematic workflow of an integrative transcriptome-wide analysis for AD.
figure 5

Left panel delineates the identification of candidate genes and right panel describes the integrative analysis for analyzing the gene-gene connectivity and identifying the drug candidate for AD.

Methods

Data collection and pre-processing for TWAS

GWAS summary statistics for AD (Atlas ID: 3606; total: 289,307; control: 279,476; AD: 9831) based on UK Biobank (UKB2) were retrieved from GWAS Atlas (https://atlas.ctglab.nl/)70,71. The retrieved data were then converted into the LD score format using the LDSC software (version 1.0.1)72. An LD structure from the 1000 Genomes Project was used as the reference LD block for TWAS73. Seven eQTL panels from the GTEx project version 7 (skin-sun exposed, skin-not sun exposed, cells-transformed fibroblast, spleen, thyroid, whole blood and cells-EBV-transformed lymphocytes), and two eQTL panels from individual studies (NTR and YFS blood panel) were used as the pre-computed tissue-specific gene expression weights for TWAS74,75,76. The reference LD structure and eQTL panels were curated in the FUSION webpage (http://gusevlab.org/projects/fusion/) and used for TWAS of AD GWAS summary statistics12.

Transcriptome data collection and processing

Transcriptome data were searched in Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (https://www.ebi.ac.uk/arrayexpress/). Raw expression data and counts were retrieved for microarray datasets and RNA-seq data, respectively. Data derived from skin tissues of AD patients and healthy control groups were selected. The selected data consisted of one RNA-seq experiment (GSE121212, 38 controls and 27 AD patients) and four microarray experiments (GSE16161, GSE5667, GSE120721, E-MTAB-8149 with 9, 5, 22, and 19 controls and 9, 6, 15, and 83 AD patients, respectively). RNA-seq data were processed and normalized using edgeR R package, and the counts per million (cpm) were calculated with DESeq2 R package77,78,79. Microarray data were normalized using the robust multi-array average method in the oligo R package80.

Tissue-specific enrichment analysis of GWAS signals

Tissue specificity analysis based on the GWAS data was conducted with the GENE2FUNC process of the FUMA web server17. The threshold for enrichment significance was Bonferroni-corrected P < 0.05. Tissue-specific heritability enrichment analysis was performed with LDSC-SEG on the multi-tissue expression and chromatin datasets that contained the tissue-specific gene expressions and epigenetic chromatin modifications, respectively18. Tissues with FDR < 0.05 were regarded as significantly enriched.

Transcriptome-wide association analysis

FUSION performs summary-based gene expression imputation to identify the association between expected gene expression values and the trait by applying weighted-linear mixed models using pre-computed eQTL panels composed of cis-effects on SNP-gene regulation and SNP-trait effects. TWAS for AD summary statistics was performed using the default parameters of FUSION. Gene expression was calculated with four models: best linear unbiased predictor, Bayesian sparse linear mixed model, elastic net, and least absolute shrinkage and selection operator. The result from the best performing model of each gene was displayed as the expected gene expression value. A permutation test was performed using FUSION to evaluate the robustness of the TWAS signals (number of permutations: 100,000).

Gene prioritization analysis

The MAGMA was performed with the FUMA web server (https://fuma.ctglab.nl/), and the COLOC analysis was implemented for the genes that showed P < 0.05 with FUSION software17,19. The significance threshold for the MAGMA was determined as a Bonferroni-corrected threshold (P < 0.05/the number of analyzed genes (18,899) = ~2.64 × 10−6). Each hypothesis represents the following phenomenon in our analysis. H0: there is no causal variant; H1: there are only causal variants between genotype and phenotype; H2: there are only causal variants for eQTL; H3: phenotype and gene expressions are driven by two different causal variants; and H4: phenotype and gene expressions share the same causal variant. Following Li et al., we determined the threshold of colocalization as PP3 + PP4 > 0.8 and PP4/PP3 > 281.

Post-analysis of TWAS results

To assess the associations of multiple TWAS signals in the same loci, we conducted conditional and joint analysis for TWAS-significant loci with a FUSION post-process function. To support the robustness of novel TWAS signals, we performed fine-mapping of TWAS associations using the FOCUS method (version 0.6.10) proposed by Mancuso et al., while eQTL panels were confined to the tissue where TWAS-significant loci of interest were observed82. FOCUS identifies credible gene sets containing causal genes at the nominal confidence level (over 90%). Additionally, the biological pathways related to TWAS signals were analyzed by GSEA using a TWAS-GSEA (v.1.2, https://github.com/opain/TWAS-GSEA) with GO-BP and KEGG reference gene sets retrieved from the molecular signatures database (MsigDB, http://software.broadinstitute.org/gsea/msigdb)21,22,83,84,85. Tissue-specific effects of TWAS results were analyzed by calculating the mean of squared Z (TWAS) for each tissue following Mancuso et al.86.

Transcriptome meta-analysis

Individual datasets were merged by corresponding the common Entrez IDs. The cpm values of the RNA-seq dataset were adjusted as log2(cpm + 0.25) to avoid negative values following Mooney et al. with slight modifications87. Briefly, cpm values were used instead of fragments per kilobase per million mapped reads (FPKM) values. Batch effects between datasets were corrected using the ComBat function in the sva R package88. DEGs between the control group and AD group were identified using the limma R package89. DEGs with positive and negative log2FCs were regarded as upregulated and downregulated meta-signatures, respectively.

Validating correlation between TWAS results and meta-analysis

GSEA was performed to examine the functional correlation between TWAS results from each panel and the results from transcriptome meta-analysis. GSEA pre-ranked method was performed on the gene sets with up- or downregulated meta-signatures and TWAS results ranked with the Z (TWAS) values from each panel. The significance threshold for enrichment was set as FDR < 0.25 following the recommendation of MsigDB. Functional annotation of the meta-analysis results was performed with GSEA pre-ranked method by ordering the genes by their log2FC values. To analyze the overlapping enrichment with TWAS-results, we applied the gene sets used for TWAS-GSEA as the reference gene sets.

Network analysis

The significant genes from TWAS and DEGs from transcriptome meta-analysis were used as the input nodes for network analysis. STRING (https://string-db.org/) was used to construct PPI networks90. Constructed networks were processed using Cytoscape (version 3.8.2), and sub-network analysis was performed with the MCODE Cytoscape plug-in and the NetworkAnayzer Cytoscape tool91,92,93.

The list of the 2817 known AD-associated markers was downloaded from Open Targets Platform (https://platform.opentargets.org/)94. Tissue- or cell-specific functional networks were retrieved from HumanBase (https://hb.flatironinstitute.org/), and 15 AD-related tissue- or cell-specific networks were selected29. Selected networks were for three tissues (blood, blood plasma, and skin) and 12 cell types (B-lymphocytes, culture condition CD8 cells, dendritic cells, eosinophils, granulocytes, keratinocytes, monocytes, mononuclear phagocytes, natural killer cells, neutrophils, skin fibroblasts, and T-lymphocytes). Because the edge weights were extremely skewed and we did not want to select ‘not-available’ values, log2(connectivity score+1) was used to scale them. They were then analyzed with a one-tailed Mann–Whitney test.

Drug repositioning with computational tools

The CMAP is a web-based drug-repositioning tool that analyzes the input up- and down-gene signatures of in vitro-derived drug signatures in the CMAP database (https://portals.broadinstitute.org/cmap/) by Kolmogorov–Smirnov statistics95. TWAS-significant genes and meta-signatures were separately used as input for the analysis. Both gene lists were converted to the corresponding Affymetrix probe identifiers, and the queries were executed by reversing the AD signatures. Enrichment scores for each drug were combined by calculating individual product scores following Liu et al., and candidates with a product score > 0.6 were selected96.

Similarity analysis with currently approved drugs for AD

The connectivity between approved AD drugs and our drug candidates was assessed following Kim et al.97. Among approved AD drugs, small molecules that are available in MANTRA 2.0 were selected as reference drugs. MOA similarities were analyzed with the MANTRA 2.0 web-based platform98. The maximum number of neighboring nodes was set to 10, and the MOA similarity network was visualized by Cytoscape (version 3.8.2). Structural information on the molecules in.sdf format was retrieved from DrugBank (https://drugbank.ca) and PubChem (https://pubchem.org) using the rcdk R package99. For comparison of structural similarities, the extended connectivity fingerprint with a diameter set to 4 was calculated for each molecule, and the cosine coefficients between the drug candidates and the reference drugs were calculated with the Rcpi R package100.

Statistical analysis

Statistical analyses were conducted using the statistical computing programming language R (version 4.0.3). The results were visualized with R package ggplot2 and ggrepel (https://github.com/slowkow/ggrepel).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.