Multifaceted analysis of cross-tissue transcriptomes reveals phenotype–endotype associations in atopic dermatitis

Atopic dermatitis (AD) is a skin disease that is heterogeneous both in terms of clinical manifestations and molecular profiles. It is increasingly recognized that AD is a systemic rather than a local disease and should be assessed in the context of whole-body pathophysiology. Here we show, via integrated RNA-sequencing of skin tissue and peripheral blood mononuclear cell (PBMC) samples along with clinical data from 115 AD patients and 14 matched healthy controls, that specific clinical presentations associate with matching differential molecular signatures. We establish a regression model based on transcriptome modules identified in weighted gene co-expression network analysis to extract molecular features associated with detailed clinical phenotypes of AD. The two main, qualitatively differential skin manifestations of AD, erythema and papulation are distinguished by differential immunological signatures. We further apply the regression model to a longitudinal dataset of 30 AD patients for personalized monitoring, highlighting patient heterogeneity in disease trajectories. The longitudinal features of blood tests and PBMC transcriptome modules identify three patient clusters which are aligned with clinical severity and reflect treatment history. Our approach thus serves as a framework for effective clinical investigation to gain a holistic view on the pathophysiology of complex human diseases.

Supplementary Figure 1 A schematic presentation showing the process of filtering sample and patient for each analysis.The histogram shows the distribution of AD patients in cross-sectional analysis at different scores of EASI.Severity strata was defined using EASI in individual patients; mild: 0.1 -5.9, moderate: 6.0 -22.9, severe: 23-72.Source data are provided as a Source Data file.Supplementary Figure 5 Immunohistochemistry of skin tissue from the AD patients who have either erythema-or papulation-skewed skin manifestations.a.We defined the degree of erythema-skewness as erythema/(erythema + papulation) using EASI partial points, and randomly picked 5 patients who have erythema-skewness ≧ 0.6 as erythema-skewed patients and 5 patients who have erythema-skewness ≦ 0.4 as papulation-skewed patients.b,c.Immunohistochemistry of skin tissue stained for CD4 (target protein was stained in red) in 5 erythema-skewed patients (b) and 5 papulation-skewed patients (c) picked.Bars: 500 μm (the right column), 100 μm (the middle and left columns).One slide per patient was assayed for one marker protein in histological analysis.Immunohistochemistry of skin tissue in two representative AD patients who have a score composition that are highly skewed to either of erythema (upper) or induration/papulation (lower) as well as healthy control.Left: a 51-year-old male patient who has erythema-skewed EASI composition.Middle: a 50year-old male patient who has papulation-skewed EASI composition (total = 21.0,erythema = 3.0, papulation = 8.4).Right: a 52-year-old female who have no eczema nor skin diseases.Target proteins were stained in red.Bars = 100 μm (except for low magnification HE images).MPO: myeloperoxidase, MBP: major basic protein, K16: keratin-16, FLG: filaggrin.One slide per patient was assayed for one marker protein in histological analysis.Serial skin sections AD#1 (Erythema-skewed)            The number of active connections were assessed based on cytokine -receptor coupling.Severity strata was defined using EASI in individual patients; mild: 0.1 -5.9, moderate: 6.0 -22.9, severe: 23-72.Multiple comparison tests were carried out on three AD severity strata with Kruskal-Wallis test.Boxplots show median and first and third quartiles, whiskers extending to the highest and lowest values no further than 1.5*interquartile range.N; Healthy = 14, Mild = 19, Mild = 63, Severe = 33 (biologically independent samples).Source data are provided as a Source Data file.Top 5 PC1 contributing genes

GO description
Fatty

sModu05
(2.1E-13) sModu06 c lu s te r 1 c lu s te r 2 c lu s te r 3 pModu07.mac c lu s te r 1 c lu s te r 2 c lu s te r 3 pModu09.mac c lu s te r 1 c lu s te r 2 c lu s te r 3 Lympho.mac c lu s te r 1 c lu s te r 2 c lu s te r 3 Neutro.mac Supplementary Table 1 Pilosebaceous unit-related gene set defined for filtering skin samples.Pilosebaceous unit-related genes were extracted by comparison between two patient clusters identified by applying unsupervised k-means clustering (k = 2) on healthy controls.

Supplementary Tables
Supplementary Table 2 Drugs used for systemic treatment of AD in this study population.

Drug name Category
Cyclospoin in the HCs than the AD patients in sModu09 (GO: Extracellular matrix organization, top genes: PI16, FBLN1, ADH1B, MFAP4, CFD), sModu17 (GO: Formation of the cornified envelope, top genes: FLG2, LOR, LCE5A, FLG, IL37) in skin, suggesting the importance of these modules for normal function of epidermal barrier and connective tissue in healthy skin.

Application of linear mixed model on time series dataset
To examine the relationship between each of the blood analytes and the disease severity in individual patients in the longitudinal settings, we applied linear mixed model (LMM) on time series data using lmer() function from the lme4 R package [1].Respective blood derived parameter was tested for linear relationship with disease severity, with patient IDs used as random effects.While fixed effect coefficient with p > 0.05 was considered to be significant, random effect coefficient with SD > 0.13 was considered to be at large variance across patients, suggested to be potential stratifying factors that would highlight the differential longitudinal features among patients.
Accordingly, four blood derived parameters, TARC, Eosinophil, LDH and PLT were found to have significant fixed effects in linear relationship with disease severity (coefficient p < 0.05), under the assumption that individual patients harbor their own random effects both in intercept and coefficient.Above all, TARC showed the smallest p-value and the largest coefficient value as a fixed effect, with substantial variance in random effects of both intercept and coefficient.This suggests that TARC bears varying size in both the baseline and effects on linear relation to disease severity by patients, thereby being helpful for fundamental characterization of patients in the context of inter-individual difference as well as intraindividual variation in the longitudinal setting.

Supplementary Figure 3
Histogram of frequency of AD patients at different disease severity.

Figure 4
Identification of the optimal number of clusters for individual EASI partial scores.a,b.Silhouette width plot (a) and elbow plot (b) for identifying the optimal number of clusters for the correlation matrix of EASI partial score.The X-axes indicate the number of clusters ranging from 1 to 10 and the Y-axes indicate average silhouette width (a) or total within sum of square (b).Both of a highest average silhouette coefficient in the silhouette plot and a large drop in the elbow plot were observed at the number of clusters of 2.

FLGSupplementary Figure 6
Immunohistochemical analysis revealed shared and differential characteristics in the skin tissue of erythema and papulation-skewed AD patients.

Supplementary Figure 8
Comparison of number of active connections between skin and PBMC among three AD severity strata.

Supplementary Figure 9
Statistics of transcriptome modules associated with AD. a.A plot shows the size of PC1 value per patient.PC1 value were obtained by applying PCA on gene expression data of patients for each gene module.b.Variance explained by PC1-PC20 in PCA on expression level of transcriptome modules in skin (upper) and PBMC (lower) across patients.Source data are provided as a Source Data file.
of time series features in top contributing factors among three patient clusters in AD.Two classes of time series features: mean (upper) and mean absolute change (MAC, lower) of top contributing factors in principal component analysis of time series features across patients.Boxplots show median and first and third quartiles, whiskers extending to the highest and lowest values no further than 1.5*interquartile range.Brunner-Munzel rank test.Multiple comparison tests were carried out using Kruskal-Wallis test.P-values less than 0.05 were considered as significant and subsequently tested for post-hoc comparison with Brunner-Munzel test with p-value correction by Holm's method.NS: not significant, *p < 0.05, **p < 0.01, ***p < 0.001.N; cluster1 = 2, cluster2 = 7, cluster3 = 21 (biologically independent samples).Source data are provided as a Source Data file.

Supplementary Figure 12 Association of omics features with clinical severity in a longitudinal setting.
Trajectories of observed/predicted EASI (upper) and intensity of blood examination and PBMC transcriptome modules (lower) in one year in two representative patients.