Introduction

Atopic dermatitis (AD, or eczema) is a common allergic disease, characterised by (often relapsing) skin inflammation affecting up to 20% of children and 10% of adults1. Several genome-wide association studies (GWAS) have been performed in recent years, identifying genetic risk loci for AD.

Our most recent GWAS meta-analysis within the EAGLE (EArly Genetics and Lifecourse Epidemiology) consortium, published in 2015 uncovered 31 AD risk loci2. Since then, additional GWAS have been published which have confirmed known risk loci3,4 and discovered novel loci5. Five novel loci were identified in a European meta-analysis6, and variants in 3 genes were implicated in a rare variant study in addition to 5 novel loci7. Four novel loci were reported in a Japanese population (and another 4 identified in a trans-ethnic meta-analysis in the same study)8, giving a total of 71 previously reported AD loci2,3,4,5,6,7,8,9,10,11,12,13,14 (defined as 1 Mb regions) of which 57 have been reported in European ancestry individuals, 18 have been reported in individuals of non-European ancestry and 29 in individuals across multiple ancestry groups (Supplementary Data 1).

The availability of several new large population-based studies has provided an opportunity to perform an updated GWAS of AD, aiming to incorporate data from all cohorts that have contributed to previously published AD GWAS, as well as data from additional cohorts, to present the most comprehensive GWAS of AD to date, including comparison of effects between European, East Asian, Latino and African ancestral groups. In this work we identify novel loci and use multi-omic data to further characterise these associations, prioritising candidate causal genes at individual loci and investigating the genetic architecture of AD in relation to tissues of importance and shared genetic risk with other traits.

Results

European GWAS

The discovery European meta-analysis (N = 864,982; 60,653 AD cases and 804,329 controls from 40 cohorts, summarised in Supplementary Data 2) identified 81 genome-wide significant independent associated loci (Fig. 1a and Supplementary Fig. 1). 52 were at previously reported loci (Table 1) and 29 (Table 2) were novel (according to criteria detailed in the methods). All 81 were associated in the European 23andMe replication analysis (Bonferroni corrected P < 0.05/81 = 6 × 10−4), N = 2,904,664, Table 1). There was little evidence of genomic inflation in the individual studies (lambda <1.05) and overall (1.06). Conditional analysis determined 44 additional secondary independent associations (P < 1 × 10−5) across 21 loci (Supplementary Data 3).

Fig. 1: Manhattan plots of atopic dermatitis GWAS.
figure 1

(a) the European-only fixed effects meta-analysis (n = 864,982 individuals) and (b) the multi-ancestry MR-MEGA meta-analysis (n = 1,086,394 individuals). −log10(P-values) are displayed for all variants in the meta-analysis. Variants that meet the genome-wide significance threshold (5 × 10−8, red line) are shown in green.

Table 1 Genome-wide significant loci in European-only analysis that have been previously reported
Table 2 Novel genome-wide significant loci in European-only analysis

The SNP-based heritability (h2SNP) for AD was estimated to be 5.6% in the European discovery meta-analysis (LDSC intercept=1.042 (SE = 0.011)). This is low in comparison to heritability estimates for twin studies (~80%)15,16, but comparable with previous h2SNP estimates for AD in Europeans (5.4%)6.

Multi-ancestry GWAS

In a multi-ancestry analysis including individuals of European, Japanese, Latino and African ancestry (Supplementary Data 2, N = 1,086,394; 65,107 AD cases and 1,021,287 controls), a total of 89 loci were identified as associated with AD (Fig. 1b and Supplementary Fig. 1). 75 of these were not independent of lead variants identified in the European-only analysis (r2 > 0.01 in the relevant ancestry) and a further 9 showed some evidence for association (Bonferroni corrected P < 0.05/89 = 5.6 × 10−4) in the European analysis, but 5 were not associated (P > 0.1) in Europeans (Table 3, Supplementary Data 4).

Table 3 Additional loci associated with the multi-ancestry analysis

Of the 14 loci that reached genome-wide significance in the multi-ancestry discovery analysis only (Table 3), 8 replicated in at least one of the replication samples (of European, Latino and/or African ancestry; Bonferroni corrected P < 0.05/14 = 3.6 × 10−3). Two index SNPs which did not replicate in any of the samples (rs9864845 (near CCDC80), rs4312054 (near NLRP10)) appear to have been driven by association in the Japanese RIKEN study only (Supplementary Data 4, Supplementary Figs. 2, 3). Whilst the allele frequencies of these index SNPs are similar between Europeans and Japanese (37% vs 42% for rs9864845, 41% vs 46% for rs4312054, Supplementary Data 5), in a multi-ancestry fixed effect meta-analysis at both these loci there were neighbouring (previously reported)8 SNPs with stronger evidence of association (rs72943976, P = 2 × 10−9 and rs59039403 P = 2 × 10−35, Supplementary Fig. 3), that did show large allele frequencies for Japanese (~34% and 13%, respectively) but <1% in Europeans. A further 4 loci did not replicate, and on closer examination (Supplementary Fig. 2, and MAF in cases <1%), their association in the discovery analysis appeared to be driven by a false positive outlying result in a single European cohort.

Seven of the loci in Table 3 have been previously reported as associated with AD. Two (rs117137535 (near ARRDC1)7 and rs1059513 (near STAT6)8) were previously only associated with Europeans (and these were variants that were just below the genome-wide significance threshold in our European only analysis). Three (rs4262739 (near ETS1), rs4574025 (within TNFRSF11A) and rs6023002 (near CYP24A1)) were previously associated in Japanese and Europeans8, while 2 were previously associated only in Japanese8,10, using the same Japanese data (RIKEN) that we include here. Therefore, in our multi-ancestry analysis (and replication) we identify 3 loci that have not previously been reported in a GWAS of AD of any ancestry (rs9247 (near INPP5D), rs34599047 (near ATG5) and rs7773987 (near AHI1)), all of which are associated in two or more populations in our data (Table 3).

In addition, for 5 loci which had previously been associated with individuals of European and/or Japanese ancestry, we now show evidence that these are also associated with individuals of Latino ancestry and one is also associated in individuals of African ancestry (Table 3).

Comparison of associations between ancestries

Effect sizes of the index SNPs were remarkably similar between individuals of European and Latino ancestry (Supplementary Fig. 4A). There were only two variants with any evidence for a difference (where Latino P > 5 × 10−4 and the 95% confidence intervals didn’t overlap), but the plot shows that these were only marginally different and likely to be due to chance. Effect size comparison of the index SNPs between individuals of European and African ancestry showed greater differences (Supplementary Fig. 4B). 17 SNPs showed some evidence for being European-specific in that comparison. The confidence intervals in the Japanese data were much wider but there was weak evidence for one SNP being European-specific and stronger evidence for two SNPs being Japanese-specific (Supplementary Fig. 4C). These were rs4312054 (JAP CI: 0.75-0.84, EUR CI: 0.99-1.01) and rs9864845 (JAP CI: 1.16-1.30, EUR CI: 0.99-1.06), mentioned earlier as the SNPs that appeared to be driven only by Japanese individuals in the multi-ancestry meta-analysis (Supplementary Data 4).

Established associations

A review of previous work in this field (Supplementary Data 1) shows that a total of 202 unique variants (across a much smaller number of loci) have been reported to be associated with AD. We found evidence for all but 7 variants of these being nominally associated in the current GWAS (81% in the European and 96% in the multi-ancestry analysis). Variants we did not find to be associated were either rare variants (MAF < 0.01), or insertion/deletion mutations, which were not included in our analysis.

Genetic correlation between AD and other traits

LD score regression analyses showed high genetic correlation, as expected, between AD and related allergic traits, e.g. asthma (rg=0.53, P = 2 × 10−32), hay fever (rg=0.51, P = 7 × 10−17) and eosinophil count (rg = 0.27, P = 1 × 10−7) (Supplementary Fig. 5 and Supplementary Data 6). In addition, depression and anxiety showed notable genetic correlation with AD (rg = 0.17, P = 2 × 10−7), a relationship which has been reported previously, but causality has not been established17. Furthermore, gastritis also showed substantial genetic correlation (rg = 0.31, P = 1 × 10−5), which may be due to the AD genetic signal including variants with pervasive inflammatory function or the observed correlation could indicate a shared risk locus for inflammation or microbiome alteration in the upper gastrointestinal tract, or it may reflect the use of systemic corticosteroid treatment for atopic disease which in some cases causes gastritis as a side effect.

Tissue, cell and gene-set enrichment

The tissue enrichment analyses using distinct molecular evidence (representing open chromatin and gene expression) both found blood to be the tissue showing strongest enrichment of GWAS loci (Fig. 2). The Garfield test for enrichment of genome-wide loci (with P < 1 × 10−8) in DNase I hypersensitive sites (DHS broad peaks) found evidence of enrichment (P < 0.00012) in 41 blood tissue analyses, a greater signal than another tissue or cell type (Fig. 2a and Supplementary Data 7). The strongest enrichment (OR > 5.5 and P < 1 × 10−10) was seen for T-cell, B-cell and natural killer lymphocytes (CD3+, CD4+, CD56+ and CD19+). As expected for AD, Th2 showed stronger enrichment (OR = 4.3, P = 1 × 10−8) than Th1 (OR = 2.3, P = 2 × 10−4). The strongest enrichment in tissue samples representing skin was seen for foreskin keratinocytes (OR = 2.0, P = 0.008), but this did not meet a Bonferroni-corrected P-value threshold (0.05/425 = 1 × 10−4).

Fig. 2: Cell type tissue enrichment analysis.
figure 2

a GARFIELD enrichment analysis of open chromatin data. Plot shows enrichment for AD associated variants in DNase I Hypersensitive sites (broad peaks) from ENCODE and Roadmap Epigenomics datasets across cell types. Cell types are sorted and labelled by tissue type. ORs for enrichment are shown for variants at GWAS thresholds of P < 1 × 10−8 (black) and P < 1 × 10−5 (blue) after multiple-testing correction for the number of effective annotations. Outer dots represent enrichment thresholds of P < 1 × 10−5 (one dot) and P < 1 × 10−6 (two dots). Font size of tissue labels corresponds to the number of cell types from that tissue tested. b MAGMA enrichment analysis of gene expression data. Plot shows P-value for MAGMA enrichment for AD associated variants with gene expression from 54 GTEx ver.8 tissue types. The enrichment –log10(P-value) for each tissue type is plotted on the y-axis. The Bonferroni corrected threshold P = 0.0009 is shown as a dotted line and the 7 tissue types that meet this threshold are highlighted as red bars.

The most enriched tissue type in MAGMA gene expression enrichment analysis was whole blood (P = 2 × 10−14). Others that met our Bonferroni-corrected P-value (P < 0.0009) were spleen, EBV-transformed lymphocytes, sun-exposed and unexposed skin, small intestine and lung (Fig. 2b and Supplementary Data 8).

DEPICT cell-type enrichment analysis identified a similar set of enriched cell-types: blood, leucocytes, lymphocytes and natural killer cells, but with the addition that the strongest enrichment was seen for synovial fluid (P = 2 × 10−7), which may be due to its immune cell component.

The DEPICT pathway analysis found 420 GO terms with enrichment (FDR < 5%) amongst the genes from our GWAS loci (Supplementary Data 9). The pathway with the strongest evidence of enrichment was ‘hemopoietic or lymphoid organ development’ (P = 1 × 10−16). All terms with FDR < 5% are represented in Supplementary Fig. 6, where the terms are grouped according to similarity and the parent terms labelled illustrating the strong theme of immune system development and signalling.

Gene prioritisation and biological interpretation in silico

The top genes prioritised using our composite score from publicly available data for each of the established European AD loci are shown in Table 1 and Fig. 3a (and the evidence that makes up the prioritisation scores is shown in Supplementary Fig. 7). The top three prioritised genes at each independent locus are shown in Supplementary Data 10 and a summary of all evidence for all genes reviewed in silico is presented in Supplementary Data 11.

Fig. 3: Prioritised genes at GWAS loci.
figure 3

Prioritised genes at known (a) and novel (b) loci. For each independent GWAS locus the top prioritised gene (or genes if they were tied) from our bioinformatic analysis is presented along with a bar representing the total evidence score for that gene. A more detailed breakdown of the constituent parts of this evidence score is presented in Supplementary Fig. 5 and the total evidence scores for the top 3 genes at each locus are presented in Supplementary Data 10. NB. There are some cases of two independent GWAS signals implicating the same gene.

In most cases the top prioritised gene had been implicated (in previous GWAS) or is only superseded marginally by an alternative candidate. One interesting exception is on chromosome 11, where MAP3K11 (with a role in cytokine signalling – regulating the JNK signalling pathway) is markedly prioritised over the previously implicated OVOL118 (involved in hair formation and spermatogenesis), although the prioritisation of MAP3K11 is predominantly driven by TWAS evidence in multiple cell types rather than colocalisation or other evidence.

There are three instances where multiple associations in the region implicate additional novel genes. Two are genes involved in TLR4 signalling: S100A9 (prioritised in addition to the established FLG and IL6R on chromosome 1) and AGER (prioritised in addition to HLA-DRA on chromosome 6). The third has a likely role in T-cell activation: CDC42SE2 (prioritised in addition to SLC22A5 on chromosome 5).

The top prioritised gene at each of the novel European loci are shown in Table 2 and Fig. 3b. Many are in pathways already identified by previous findings (e.g. cytokine signalling—specially IL-23, antigen presentation and NF-kappaB proinflammatory response). At one locus, the index SNP, rs34215892 is a missense (Pro274Leu) mutation within the DOK2 gene, although this mutation is categorised as tolerated or benign by SIFT and PolyPhen. The genes with the highest prioritisation score amongst the novel loci were GPR132 (total evidence Score=24), NEU4 (score=22), TNFRSF1B (score = 19) and RGS14 (score=19) and each show biological plausibility as candidates for AD pathogenesis.

GPR132 is a proton-sensing transmembrane receptor, involved in modulating several downstream biological processes, including immune regulation and inflammatory response, as reported previously in an investigation of this protein’s role in inflammatory bowel disease19. The index SNP at this locus, rs7147439 (which was associated with Europeans, Latinos, Africans, but not Japanese), is an intronic variant within the GPR132 gene. The AD GWAS association at this locus colocalises with the eQTL association for GPR132 in several immune cell types (macrophages20, neutrophils21, several T-cell datasets22) as well as in colon, lung and small intestine in GTEx23. GPR132 has also been shown to be upregulated in lesional and nonlesional skin in AD patients, compared to skin from control individuals24,25. OpenTargets and POSTGAP both prioritise GPR132 for this locus.

The SNP rs62193132 (which showed consistent effects in European, Latino and Japanese individuals, but little evidence for association in African individuals, Supplementary Fig. 2), is in an intergenic region between NEU4 (~26 kb) and PDCD1 (~4 kb away) on chromosome 2. NEU4 was the highest scoring in our gene prioritisation pipeline (score=22). However, PDCD1 also scores highly (score = 18, Supplementary Data 10). NEU4 is an enzyme that removes sialic acid residues from glycoproteins and glycolipids, whereas PDCD1 is involved in the regulation of T cell function. The AD GWAS association at this locus colocalises with the eQTL for NEU4 in several monocyte and macrophage datasets22,26,27,28 as well as in the ileum, colon and skin23,29. The eQTL for PDCD1 also colocalises in monocytes and macrophages27,28 as well as T-cells22, skin and whole blood23. In addition to the eQTL evidence, PCDC1 is upregulated in lesional and non-lesional skin in AD patients compared to skin from control individuals24,25. OpenTargets and PoPs prioritise NEU4, whilst POSTGAP prioritises PDCD1 at this locus.

TNFRSF1B is part of the TNF receptor, with an established role in cytokine signalling. rs61776548 (which showed consistent associations across all major ancestries tested) is 136 kb upstream of TNFRSF1B, actually within an intron of MIIP. MIIP encodes Migration and Invasion-Inhibitory Protein, which may function as a tumour suppressor. However, TNFRSF1B is a stronger candidate gene since the AD GWAS association at this locus colocalises with the eQTL for TNFRSF1B T cells22,30, macrophages20, fibrobasts31 and platelets29. Furthermore, TNFRSF1B gene expression and the corresponding protein are upregulated in lesional and nonlesional skin compared to controls24,25,32 and the PoPs method prioritised this gene at this locus.

RGS14 is a multifunctional cytoplasmic-nuclear shuttling protein which regulates G-protein signalling, but whose role in the immune system is yet to be established. rs4532376 is 10.5 kb upstream of RGS14 and within an intron of LMAN2. The AD GWAS association at this locus colocalises with the eQTL for RGS14 in macrophages20, CD8 T-cells22, blood33 and colon23. RGS14 has also been shown to be upregulated in lesional skin of AD cases compared to skin from control individuals25 and DEPICT prioritises this gene. However, at this locus LMAN2 is also a reasonably promising candidate (score=15) based on colocalisation and differential expression evidence (Supplementary Data 11). OpenTargets and POSTGAP prioritise this alternative gene at this locus and it is possible that genetic variants at this locus influence AD risk through both genetic mechanisms.

We did not include the 3 novel variants from the multi-ancestry analysis in the comprehensive gene prioritisation pipeline because the available resources used predominantly represent European samples only. We did however investigate these variants using Open Targets Genetics, to identify any evidence implicating specific genes at these loci. rs9247 is a missense variant in INPP5D, encoding SHIP1, a protein that functions as a negative regulator of myeloid cell proliferation and survival. The INPP5D gene has been implicated in hay fever and/or eczema5 and other epithelial barrier disorders including inflammatory bowel disease. rs7773987 is intronic for AHI1 (Abelson helper integration site 1) which is involved with brain development but expressed in a range of tissues throughout the body; single cell analysis in skin shows expression in multiple cell types including specialised immune cells and keratinocytes, but the highest abundance is in endothelial cells (data available from v21.1 proteinatlas.org). The closest genes to rs34599047 are ATG5 (involved in autophagic vesicle formation) and PRDM1 (which encodes a master regulator of B cells).

Network analysis

STRING network analysis of the 70 human proteins encoded by genes listed in Tables 1 and 2 showed a protein-protein interaction (PPI) enrichment p-value < 1 × 10−16. The five most highly significant (FDR P = 1 × 10−9) Gene Ontology (GO) terms for biological process relate to immune system activation and regulation (Supplementary Data 12). The network described by the highly enriched term ‘Regulation of immune system process’ (GO:0002682) is shown in Fig. 4.

Fig. 4: Predicted interaction network of proteins encoded by the top prioritised genes from known and novel European GWAS loci.
figure 4

Protein-protein interaction analysis carried out in STRING v11.5; nodes coloured red represent the GO term ‘Regulation of immune system process’ (GO:0002682) for which 28/1514 proteins are included (FDR P = 1 × 10−9). Full results for all identified pathways are available in Supplementary Data 12.

Extending the network to include the less well characterised genes/proteins from the multi-ancestry analysis further strengthened this predicted network: The PPI enrichment was again P < 1 × 10−16 and ‘Regulation of immune system process’ was the most enriched term (FDR P = 5 × 10−13).

Discussion

We present the results of a comprehensive genome-wide association meta-analysis of AD in which we have identified a total of 91 associated loci. This includes 81 loci identified amongst individuals of European ancestry replicated in a further sample of 2.9 million European individuals (as well as many showing replication in data for other ancestries). Of the additional 10 loci identified in a multi-ancestry analysis, 8 replicated in at least one of the populations tested (European, Latino and African ancestry) and a further 2 may be specific to individuals of East Asian ancestry (but require replication).

The majority of the loci associated with AD are shared between the ancestry groups represented in our data, though there were some notable exceptions. We report two previously identified loci with associations that appear to be specific to the Japanese cohort (although driven by just one cohort and still require independent replication). Whilst these have been previously reported8, this used the same data as examined here. However, rs59039403 within NLRP10 is a likely deleterious missense mutation at reasonable frequency in Japanese (13%) that is present at a far lower frequency (<1%) in Europeans. Equally, previous further investigation of the association near CCDC80 found a putative functional variant (rs12637953) that affects the expression of an enhancer (associated with CCDC80 promoter) in epidermis and Langerhans cells8, increasing the evidence that these Japanese-specific loci are real. Furthermore, we have identified several loci with association in Europeans (many of which also showed association in individuals of Japanese or Latino ancestry) but which showed no evidence of association in individuals of African ancestry. It is tempting to speculate, using our knowledge of the differing AD phenotypes between European, Asian and African people34,35 that the differing genetic associations at some loci may contribute to these clinical observations. rs7773987 within an intron of AHI1 may, for example, indicate a mechanism contributing to neuronal sensitisation leading to the marked lichenification and nodular prurigo-type lesions36 that characterise AD in some people of African and European ethnicities37. Large-scale population cohorts (as used here) have been useful for identifying associated variants. However, we do note that the variants identified should be further examined with respect to specific aspects of AD (age of onset, severity and longitudinal classes38) in future analysis.

The dominance of blood as the tissue showing most enrichment of our GWAS signals in regions of DNAse hypersensitivity and of eQTLs suggests the importance of systemic inflammation in AD and this is in keeping with knowledge of the multisystem comorbidities associated with AD39. The dominance of blood also supports the utility of this easily accessible tissue when characterising genetic risk mechanisms, and for the measurement of biomarkers for many of the implicated loci. However, skin tissue also showed enrichment and there are likely to be some genes for which the effect is only seen in skin. For example, we know that two genes previously implicated in AD, FLG and CD2072,18 are predominantly expressed in the skin and in our gene prioritisation investigations there was no evidence from blood linking FLG to the rs61816766 association and only one analysis of monocytes separated from peripheral blood mononuclear cell (PBMC) samples28 which implicated CD207 for the rs112111458 association, amongst an abundance of evidence from skin for both genes playing a role in AD (Supplementary Data 11). So, whilst the enrichment analysis suggests blood as a useful tissue for genome scale studies of AD and a reasonable tissue to include for further investigation at specific loci, it does not preclude skin as the more relevant tissue for a subset of important genes.

At many of the loci identified in this GWAS, our gene prioritisation analysis, as well as the DEPICT pathway analysis, implicated genes from pathways that are already known to have a role in AD pathology. The overwhelming majority of these are in pathways related to immune system function; STRING network analysis highlighted the importance of immune system regulation, in keeping with an increasing awareness of the importance of balance in opposing immune mechanisms that can cause paradoxical atopic or psoriatic skin inflammation40. Whilst our in silico analyses cannot definitively identify specific causal genes (rather, we present a prioritised list of all genes at each locus along with the corresponding evidence for individual evaluation), it is of note that for many of the previously known loci (Table 1) our approach identifies genes which have been validated in experimental settings, e.g. FLG41, TNF42 and IL2243. The individual components of the gene prioritisation analysis have their limitations, particularly the high probability that findings, whilst demonstrating correlation, do not necessarily provide evidence for a causal relationship. This has been particularly highlighted with respect to colocalisation of GWAS and eQTL associations, where high co-regulation can implicate many potentially causal genes44. Another limitation is that only cell types (and conditions) that have been studied and made available are included in the in silico analysis, and gaps in the data may prove crucial. However, we believe this broad-reaching review of complementary datasets and methods is a useful initial approach to summarise the available evidence, prioritise genes for follow-up and provide information to inform functional experiments. The best evidence is likely to be produced from triangulation of multiple experiments and/or datasets and we have presented our workflow and findings in a way to allow readers to make their own assessments. Another important limitation of our gene prioritisation, is that we only undertook the comprehensive approach for loci associated in European individuals, given that the majority of datasets used come from (and may only be relevant for) European individuals. Expansion of resources that allow for similarly comprehensive follow-up of GWAS loci in individuals of non-European ancestry are urgently needed45. However, we do report some evidence that implicates certain genes at loci from our multi-ancestry analysis, whilst noting that these require further investigation in appropriate samples from representative populations.

Amongst the genes prioritised at the novel loci identified in this study, four are targets of existing drugs (and have the required direction of action consistent with the AD risk allele’s direction of effect on the gene expression) as reported by Open Targets46: CSF1 is targeted by a macrophage colony-stimulating factor 1 inhibiting antibody (in phase II trials as cancer therapy but also for the treatment of rheumatoid arthritis and cutaneous lupus); CTSS is targeted by a small molecule cathepsin S inhibitor (in phase I-II trials for coeliac disease and Sjogren syndrome); IL15, targeted by an anti-IL-15 antibody (in phase II trials for autoimmune conditions including vitiligo and psoriasis); and MMP12, targeted by small molecule matrix metalloprotease inhibitors (in phase III studies for breast and lung cancer, plus phase II for cystic fibrosis and COPD)47. These may offer valuable drug repurposing opportunities.

We have presented the largest GWAS of AD to date, identifying 91 robustly associated loci, 22 with some evidence of population-specific effects. This represents a significant increase in knowledge of AD genetics compared to previous efforts, taking the number of GWAS hits identified in a single study from 31 to 91 and making available the well-powered summary statistics to enable many future important studies (e.g. Mendelian Randomization to investigate causal relationships). To aid translation we have undertaken comprehensive post-GWAS analyses to prioritise potentially causal genes at each locus, implicating many immune system genes and pathways and identifying potential novel drug targets.

Methods

Appropriate ethical approval was obtained for all cohorts by their ethics committees as detailed in the Supplementary Methods.

Phenotype definition

Cases were defined as those who have “ever had atopic dermatitis”, according to the best definition for the cohort, where doctor-diagnosed cases were preferred. Controls were defined as those who had never had AD. Further details on the phenotype definitions for the included studies can be found in Supplementary Methods and Supplementary Data 2.

GWAS analysis and quality control of summary data

We performed genome-wide association analysis (GWAS) for AD case-control status across 40 cohorts including 60,653 AD cases and 804,329 controls of European ancestry. We also included cohorts with individuals of mixed ancestry (Generation R), as well as Japanese (Biobank Japan), African American (SAGE II and SAPPHIRE) and Latino (GALA II) studies, giving a total of 65,107 AD cases and 1,021,287 controls.

Genetic data was imputed separately for each cohort with the majority of European cohorts using the haplotype reference consortium (HRC version r1.1) reference panel48 (imputed with either the Michigan or Sanger server). 8 European and 2 non-European cohorts instead used the 1000 Genomes Project Phase 1 reference panel for imputation. GWAS was performed separately for each cohort while adjusting for sex and ancestry principal components derived from a genotype matrix (as appropriate in each cohort). Genetic variants were restricted to a MAF > 1% and an imputation quality score > 0.5 unless otherwise specified in the Supplementary Methods. In order to robustly incorporate cohorts with small sample sizes, we applied additional filtering based on the expected minor allele count (EMAC) as previously demonstrated49. EMAC combines information on sample size, MAF and imputation quality (2*N*MAF*imputation quality score) and a threshold of >50 EMAC was used to include variants for all cohorts. QQ-plots and Manhattan plots for each cohort were generated and visually inspected as part of the quality control process.

Meta-analysis

For the discovery phase, meta-analysis of the European cohorts was performed with GWAMA47 for 12,147,822 variants assuming fixed effects, while the multi-ancestry analysis of all cohorts was conducted in MR-MEGA50 (which models the heterogeneity in allelic effects that is correlated with ancestry). The latter included only 8,684,278 variants as MR-MEGA excludes variants where the number of contributing cohorts is less than 6. P < 5 × 10−8 was used to define genome-wide significance. Clumping was performed (in PLINK 1.9051) to identify independent loci. We formed clumps of all SNPs which were ±500kb of each index SNP with a linkage disequilibrium r2 > 0.001. Only the index SNP within each clump is reported. For multi-ancestry index variants within 500 kb of index SNPs identified in the European-only analysis, we considered these to be independent if the lead multi-ancestry SNP was not in LD (r2 < 0.01) with the lead neighbouring European variant. Multi-ancestry fixed effect meta-analysis was also performed for comparison with the MR-MEGA results.

Known/Novel assignment

Novel associations are defined as a SNP that had not been reported in a previous GWAS (Supplementary Data 1), or was not correlated (r2 < 0.1 in the relevant ancestry) with a known SNP from this list. In addition, following the assignment of genes to loci (see gene prioritisation) any locus annotated with a gene that has been previously reported were also moved to the ‘known’ list. Therefore, some loci which are reported in Open Targets52,53 (but not reported in a published AD GWAS study) have been classed as novel. These loci are marked as such in Table 2.

Conditional analysis

Conditional analysis was performed to identify any independent secondary associations in the European meta-analysis. Genome-wide complex trait analysis-conditional and joint analysis (GCTA-COJO54) was used to test for independent associations 250 kb either side of the index SNPs using UK Biobank HRC imputed data as the reference. COJO-slct was used to determine which SNPs in the region were conditionally independent (using default P < 1 × 10−5) and therefore represent independent secondary associations. COJO-cond was then used to condition on the top hit in each region to determine the conditional effect estimates.

Replication

The genome-wide index SNPs identified from the European and mixed-ancestry discovery meta-analyses were taken forward for replication in 23andMe, Inc. Individuals of European (N = 2,904,664), Latino (N = 525,348) and African ancestry (N = 174,015) were analysed separately. Full details are available in the Supplementary Methods.

LD score regression

Linkage disequilibrium score (LDSC) regression software (version 1.0.1)55 was used to estimate the SNP-based heritability (h2SNP) for AD. This was performed with the summary statistics of the European discovery meta-analysis. The h2SNP was estimated on liability scale with a population prevalence of 0.15 and a sample prevalence of 0.070.

Genetic correlation with other traits was assessed using all the traits available on CTG-VL56 (accessed on 5th November 2021). We considered phenotypes with p-values below the Bonferroni-corrected alpha threshold (i.e., 0.05/1376 = 4 × 10−5) to be genetically correlated with AD (a conservative threshold given the likely correlation between many traits tested).

Bioinformatic analysis

For the following analyses we defined the regions within which the true causal SNP resides to be determined by boundaries containing furthest distanced SNPs with r2 >= 0.2 within ±500kb of the index SNP18. We refer to such regions as locus intervals and we used them as input for the analyses described below.

Enrichment analysis

Enrichment of tissues and cell types and gene sets for AD GWAS loci was investigated using DEPICT57 and GARFIELD (GWAS analysis of regulatory or functional information enrichment with LD correction)58 ran with default settings, as well as MAGMA v.1.0659 (using GTEx ver. 823 on the FUMA60 platform). In addition, we used MendelVar61 run with default settings to check for enrichment of any ontology terms assigned to Mendelian disease genes within the locus interval regions.

By default, MAGMA only assigns variants within genes. DEPICT maps all genes within a given LD (r2 > 0.5) boundary of the index variant. DEPICT gene set enrichment results for GO terms only were grouped (using the Biological Processes ontology) and displayed using the rrvgo package. The default scatter function was adapted to only plot parent terms62.

Prioritisation of candidate genes

To prioritise candidate genes at each of the loci identified in the European GWAS, we investigated all genes within ±500 kb of each index SNP (selected to capture an estimated 98% of causal genes)63. The approach used has been previously described by Sobczyk et al.18. For each gene we collated evidence from a range of approaches (as described below) to link SNP to gene, resulting in 14 annotation categories (represented as columns in Supplementary Fig. 7). We summarised these annotations for each gene into a score in order to prioritise genes at each locus. We present the top prioritised gene in the main tables, but strength of evidence varies and so we encourage readers to use our full evaluation (of all the evidence presented in Supplementary Data 11 for all genes at each locus) for loci of interest.

We tested for colocalisation with molecular QTLs, where full summary statistics were available, using coloc64 method (with betas as input). We used the eQTL Catalogue65 and Open GWAS66 to download a range of eQTL datasets from all skin, whole blood and immune cell types as well as additional tissue types which showed enrichment for our GWAS loci, such as spleen and oesophagus mucosa18. A complete list of eQTL datasets20,21,22,23,26,27,28,29,30,31,33,67,68,69,70,71 is displayed in Supplementary Data 13. pQTL summary statistics for plasma proteins72 were downloaded from Open GWAS. An annotation was included in our gene prioritisation pipeline if there was a posterior probability >95% that the associations from the AD GWAS and the relevant QTL analysis shared the same causal variant.

Additional colocalisation methods were also applied. TWAS (Transcriptome-Wide association Study)-based S-MultiXcan73 and SMR (Summary-based Mendelian Randomization)74 were run on datasets available via the CTG-VL platform (including GTEx tissue types and 2 whole blood pQTL72,75 datasets available for the SMR pipeline). For S-MultiXcan and SMR, we report only results with p-values below the alpha threshold established with Bonferroni correction, as well as no evidence of heterogeneity (HEIDI P-value > 0.05) in SMR analysis.

Genes were also annotated if they were included in any of the globally enriched ontology/pathway terms from the MendelVar analysis described above or if they were identified in direct look-ups of keywords: “skin”, “kera”, “derma” in their OMIM76 descriptions, or Human Phenotype Ontology77/Disease Ontology78 terms.

We also used machine learning candidate gene prioritisation pipelines – DEPICT57, PoPs79, POSTGAP80 and Open Targets Genetics53 Variant 2 Gene mapping tool as well as gene-based MAGMA59 test. We added annotations to genes reported in the top 3 (by each of the pipelines).

We mined the literature for a list of differential expression studies and found 9 RNA-Seq/microarray plus 4 proteomic analyses involving comparisons of AD lesional25,32,81,82,83,84 or AD nonlesional24,25,32,82,85,86,87 skin vs healthy controls. Studies with comparisons of AD lesional acute vs chronic88, blood proteome in AD vs healthy control32 and FLG knockdown vs control in living skin-equivalent89 were also included. We annotated each gene (including direction of effect, i.e. upregulated/downregulated) with FDR < 0.05 in any dataset.

Lastly, we annotated genes where the index SNP resided within the coding region according to VEP (Variant Effect Predictor)90 analysis.

For each candidate gene, we established a pragmatic approach to combine all available evidence in order to prioritise which the most plausible candidate gene(s). This prioritisation was carried out as follows:

  • The number of annotations (each representing one piece of evidence) were summed across all methods and datasets, to derive a ‘total evidence score’, i.e., if coloc evidence was observed for 5 datasets for a particular gene, this would add 5 to the score for that gene.

  • Additionally, to assess if evidence was coming from multiple datasets using the same method, or evidence was coming from diverse approaches, we counted ‘evidence types’, summing up the methods (as opposed to datasets) with an annotation for each gene tested (up to a maximum of 14), i.e., in the same example of coloc evidence observed in 5 datasets, this would add 1 to this measure for this gene. Evidence types are represented by the columns in Supplementary Fig. 7.

  • In order to prioritise genes with the most evidence, whilst ensuring there was some evidence of triangulation across methods, at each locus we prioritised the gene with the highest ‘total evidence score’ with a minimum ‘evidence type’ of 3. ‘Evidence type’ was also used to break ties.

Network analysis

Network analysis of the prioritised genes was carried out using standard settings (minimum interaction score 0.4) in STRING v11.591.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.