Introduction

Atopic dermatitis (AD) is a common chronic pruritic inflammatory disease of the skin that has a prevalence of about 10–15% in children and 5–10% in adults. For many, it occurs in childhood, can wax and wane in severity, and is often associated with other illnesses like asthma and seasonal allergies1. The lesions tend to be erythematous with areas of skin erosion and crusting. Loss of epidermal integrity is a hallmark feature of AD, the risk for which has been associated with variation in several genes but most prominently with filaggrin, which produces a protein that is important to epidermis development1,2.

Variations in the genes that encode for keratin protein and in its production have been associated with many human diseases and are essential for the production and maintenance of a healthy epidermis3,4. Keratin 6 (KRT6), like filaggrin, is found in the outer epidermis3,4. KRT6 is primarily expressed in wounded or stressed keratinocytes and is used as a marker for activated keratinocytes in an unhealthy epidermis3. KRT6 is an alarmin that triggers the production of proinflammatory cytokines and antimicrobial peptides. KRT6 expression persists while the skin is healing and the skin barrier is improving5. Because of this influence of KRT6 on epithelial repair, all of these functions could be related to the severity of AD. KRT6 genetic variation has been associated with psoriasis, a chronic inflammatory disease of the skin occurring in about 1 to 4% of the population5. For instance, a large study on real-world comorbidities of pediatric atopic dermatitis patients found a significantly higher odds of psoriasis among those with AD as compared to controls6. The production of KRT6 in psoriasis is thought to be a result of T cell derived cytokines5. However, individuals with psoriasis are more likely to have AD than the general population7. Finally, individuals with pachyonychia congenita, a very rare disease, have hypertrophic nail dystrophy and very painful calluses on soles of feet and hands (keratoderma). It is associated with mutations in KRT6A, KRT6B, KRT6C, KRT16, and KRT17. More than 50% of Individuals with pachyonychia congenita have KRT6A mutations, which elicit changes to KRT6 as well as other keratin proteins. The most common KRT6A mutation (p.Asn172del) occurs in exon 9 causing stop codon two amino acids upstream from the natural stop codon8, and several other mutations have KRT6A been noted9,10.

The physical appearance of AD is often complicated by excoriation, erythema, and skin barrier breakdown. After fine-sequencing the keratin 6A (KRT6A) gene in a large longitudinal cohort of children with AD, we evaluate whether KRT6A variation is associated with the severity of AD over time and identify single nucleotide polymorphisms (SNPs) of interest.

Methods

Patient data came from the Pediatric Eczema Elective Registry (PEER), which consisted of children who were diagnosed with atopic dermatitis11,12,13. Eligibility for the study included being between 2 and 17 years of age at enrollment and using pimecrolimus cream 1% for at least 6 weeks in the half year prior to enrollment. Every 6 months for up to 10 years, information regarding disease severity and medication use was obtained based on self-report from a questionnaire. Detailed information about medication use in this cohort has been previously published14,15. A subset of the cohort provided DNA samples, from which the KRT6A gene was sequenced. The study was approved by the University of Pennsylvania Institutional Review Board. All DNA samples were obtained after written informed consent was obtained, and all data were stripped of personal identifiers. Further, all methods were performed in accordance with the relevant guidelines and regulations. Since atopic dermatitis is a common illness and we sought to evaluate common alleles, the set of KRT6A SNPs that we consider are based on the a priori decision to analyze variants with minor allele frequency of at least 5%.

The primary binary outcome of disease persistence (yes/no) was measured for each participant at every six-month survey. Subjects were deemed to not have persistent AD if they reported complete disease control in the previous 6 months and were no longer using prescription creams or ointments. Our choice of outcome correlates with severity scores in both the PEER cohort16 and in other cohorts17,18. This is not a measure of time to AD remission but rather a measure of the persistence of AD at the time of assessment. Multiple measurements of the outcome of interest were available over time for each subject; the number of visits varied between 1 and 21; the average number of completed surveys was 10.1 (SD: 5.9). We used a Bayesian generalized linear mixed model (GLMM) to study the associations between SNPs of the KRT6A gene and AD persistence over time19,20. We included random effects in the GLMM to account for correlations between measurements on the same individual and to allow subject-specific trends. The model was used to assess the persistence of AD over the follow-up time with larger odds ratio (OR) values suggestive of lower disease persistence when the variant is present. These models were fit for the entire sample (N = 740) and then stratified by race, White (N = 393) and Black (N = 326), using the brms package in R19,21.

Specifically, for the full sample, the outcome for subject i at visit number j is modeled according to the following generalized linear mixed model with a logit link:

$$logit P\left({Y}_{ij}=1\right)= {\beta }_{0}+{b}_{0i}+{\beta }_{1}j+{b}_{1i}j+{\beta }_{2}\mathrm{variant}$$

where \({\beta }_{0}, {\beta }_{1}, \text{and} \;{\beta }_{2},\) are the intercept and regression coefficients for the fixed effects of visit number, and the variant, respectively, and \({b}_{0i}\) and \({b}_{1i}\) are the random intercept and random slope for visit number, each of which is distributed as \(N\left(0,{\sigma }^{2}\right)\). For an adjusted analysis, we next fit a model that also includes the covariates age of onset, sex, and race. While age of onset and sex are not considered true confounders since they do not affect the genetic variation in individuals, we include these since they may explain some of the variability in the outcome. For the analyses stratified by race, we include age of onset and sex in the model.

For a Bayesian approach, we place priors on the fixed effect coefficients as well as the standard deviations and correlations of the random effects. Each \(\beta \) is given a \(N({\mathrm{0,10}}^{2})\) prior, which is a weakly informative prior to reflect absence of prior information about the effects of the variant considered. The standard deviation of subject-level effects \({\sigma }^{2}\) is given a half student-t prior with 3 degrees of freedom and a scale parameter determined by the standard deviation of the response variable after the link transformation. This choice of prior is weakly informative so that its influence is small but provides regularization for the purpose of convergence and sampling efficiency. Correlations between the random effects are given a LKJ(1) prior, so that all correlation matrices are equally likely. Our choices of noninformative priors minimize the impact of the prior relative to the data on resulting estimates. The Bayesian framework has been previously employed in transethnic meta-analysis of genomewide association studies22,23.

Prior to the analysis of individual SNPs, all SNPs were evaluated to determine if they were correlated. Those with high correlation, based on pairwise squared correlation coefficient (\({R}^{2}\)) of at least \(0.70\), were combined as a single analytic unit. Table 1 gives the pairwise \({R}^{2}\) values between all the SNPs. Specifically, we see that rs177079, rs298118, rs298121, rs298122, and rs1054122 have pairwise \({R}^{2}\) values close to 1. We refer to this set as Group 1. We similarly define Groups 2, 3, and 4; the SNPs in each are provided in Table 2. Based on these groups, we create 4 composite variables that each indicate the presence of any of the SNPs in the group. For instance, the value of Composite 1 is 1 if the subject is observed to have at least one of SNPs in Group 1 and is 0 otherwise. There were three SNPs (rs376545, rs711317, rs3858631) that did not fall into any of the four groups in the sense that their pairwise correlations with all other SNPs considered had \({R}^{2}\) values less than 0.70. We then employed the Bayesian GLMM model to assess the association between each composite variable and the heal outcome.

Table 1 Pairwise correlations for the KRT6A SNPs with minor allele frequency of at least 5%.
Table 2 Estimates of the effect (odds ratio) of each correlated group on AD disease persistence and corresponding posterior intervals. OR greater than 1 indicate less disease persistence.

Results

Table 2 provides the associations of SNP groups as well as individual SNPs with AD persistence over time based on odds ratio (OR) and corresponding 95% credible intervals. For each composite variable or variant, an OR greater than 1 indicates that the odds of skin healing are greater for subjects with that variant while an OR less than 1 indicates the variant is associated with persistence of atopic dermatitis. The interpretation of the 95% credible interval is that, given observed data, the probability that the true OR estimate lies within the interval is 0.95; that is, it provides the range that contains the middle 95% of probable OR values based on the posterior distribution. Variants with credible intervals that do not contain 1 are deemed to be significantly associated with skin healing. As the premise of Bayesian methods considers the data or evidence to be fixed and are contingent on the available information, multiple testing corrections are not applicable24.

Subjects in Group 1 have approximately twice the odds of healing over time as compared to those that do not have any of those SNPs. The effect is in the opposite direction for Composites 2 and 3 as subjects have significantly greater AD disease persistence (Table 2). On the other hand, there does not appear to be an association between having the variants in Group 4 and disease persistence over time. rs376545 and rs711317 also have 95% credible intervals for the OR estimate that lay completely above 1, which suggest that these variants may result in improved skin healing over time. Further, the stratified estimates tend to be in the same direction as the overall estimates—that is, the effects of the SNPs on skin repair are similar for Whites and Blacks, although results were not statistically significant. Individual SNP associations are reported in Supplementary Table 1.

Discussion

In this study, we consider Bayesian generalized linear mixed models to evaluate the relationship between KRT 6A variants and atopic dermatitis outcomes and disease progression. These models have been commonly employed in RNA sequencing and microarray studies and artificial intelligence pipelines25. By utilizing a Bayesian approach for our analyses, the corresponding interpretation focuses on effect estimates and credible intervals rather than hypothesis testing which relies on significance levels and p-values.

SNPs in Group 1 were found to be associated with greater healing and repair of the skin; these are likely to assist with wound repair and re-epithelization. On the other hand, SNPs in Groups 2 and 3 resulted in worse skin outcomes, so healing may require longer duration. Building on present understanding for keratinocyte’s role in AD development26, further study into the mechanisms by which certain KRT6A SNPs affect keratinocyte activity and corresponding immune responses as well as interactions with treatments are warranted. As keratinocytes mediate immunity, greater exploration of the biological processes could provide insight into AD progression and therapeutic efforts.