## Introduction

A limitation in a traditional GWAS (e.g., linear mixed model, LMM) of a complex trait is that it primarily tries to capture a single or a few risk factors with high effect size when, instead, multiple risk factors with effects ranging from small to moderate are expected to jointly influence the development of a complex trait. A traditional GWAS tests each variant one at a time as fixed effects and does not account for linkage disequilibrium (LD) between variants. To account for multiple testing a stringent p-value is often used, which results in many false negatives and the variants declared significant may be overestimated. A Bayesian mixture model (BMM) estimates effect sizes of all variants simultaneously and treats them as random effects, thereby accounting for LD between variants. This results in fewer false negatives and also gives unbiased estimates of the larger variant effects20. The BMM has been adapted to genome-wide studies of complex traits, e.g., in the application BayesR20,21. BayesR models the effects of variants using four normal distributions, including one with zero effect (assuming that the majority of variants has non-measurable effect on a complex trait) and up to 1% of the total genetic variance. BayesR performs better than other methods in finding true positives compared to the number of false positives20. The aim of the current study was to identify genetic risk factors for canine AD. Assuming that many risk factors with small-to-medium effects are involved in the disease pathogenesis, we applied the BMM BayesR methodology.

Dog breeds result from strong artificial selection of favored phenotypes. Homogeneity is further intensified by subsequent closing of stud books and within-breed selection is still ongoing where dogs with specific characteristics are favored. The resulting selective sweeps are visible as a decrease in haplotype diversity caused by the rapid increase in allele frequencies at loci controlling the traits under selection. The hitchhiking effect is the unintended increase in allele frequencies of nearby variants at loci controlling another trait or disease. A pleiotropic effect can also be expected when genes responsible for the desirable trait also affect other phenotypes. In small populations, such as dog breeds, drift can also result in a loss of genetic variation. In this study, we performed whole genome analysis for signatures of selection by using the cross-population extended haplotype test (XP-EHH)22 to investigate if the selection for certain breed characteristics has also led to accumulation of risk variants for canine AD within any of the four studied breeds.

The purpose of the present study was to uncover the genetic complexity of canine AD in a novel manner to move beyond single-locus GWAS signals. We performed genetic mapping in four dog breeds predisposed to AD, using datasets consisting of samples from ~200–400 dogs per breed that were sampled in a joint international collection effort. We identify multiple disease risk loci and replicate, in the dog, the major genetic risk factor for human AD.

## Results

### Bayesian genome-wide association identifies fifteen AD-associated loci

Following quality control (QC) and relatedness filtering, the final datasets used for analyses consisted of 321 LR (178 cases and 143 controls), 256 GR (143 cases and 113 controls), 219 GSD (106 cases and 113 controls), and 235 WHWT (137 cases and 98 controls) with imputed marker sets of ~400–600 K variants (Supplementary Tables 1 and 2 and Supplementary Fig. 1). Using BayesR, we identified a total of 15 AD-associated loci; 11 in LR, one each in GR and in GSD, and two in WHWT (Fig. 1a, b, e, f). Variants with absolute effect size ≥0.0001 were defined as effect variants and AD-associated loci were regions harboring effect variants at <1 Mb distance (Table 1 and Supplementary Data 1). In LR, the three associated loci harboring variants with the highest effect sizes were located on chromosome 34 (top effect variant was ARL14 intronic), chromosome 4 (ITGA1/ISL1 intergenic), and chromosome 36 (UBE2E3/ITGA4 intergenic). One associated locus in GR was defined on chromosome 23 (SCN5A intronic), and in GSD on chromosome 9 (ABCA9 intronic). The two loci in WHWT were on chromosomes 10 (HMGA2/LLPH, intergenic) and 15 (C4orf45 intronic). The sum of risk alleles of the 11 loci (i.e., risk index) differed in cases compared to controls of LR (two-sided t-test p = 1.52 × 10−22, t-statistic = 10.6, n = 321 dogs; Fig. 2 and Supplementary Data 2), and the AD variance explained by the risk index was 26.4% in LR. When modeling each associated locus separately, the total variance explained by the risk loci was 32.8%, with the largest contribution by chromosome 3 (7.3%). Principal component (PC)1, which captures the first dimension in the relationship matrix (PCA plot, Supplementary Fig. 1), contributed 9.3% to the total AD variance in the risk index model and 5.2% when modeling loci separately in LR (Supplementary Table 3). Associated loci in GR and GSD explained 2.3% of the AD variance in each breed respectively, and the risk index for the two loci in WHWT explained 16.1% of the disease variance (Supplementary Table 4). Both GR and GSD had a high influence by PCs on AD variance (in total 18.7% and 9.7% by PC1-3, respectively) whereas the contribution was low in WHWT (3.5% by PC1-2).

### Selection analyses identify eight candidate regions

The imputed datasets from 321 LR, 256 GR, 219 GSD, and 235 WHWT were also used in the XP-EHH analyses for detecting selection signatures in cases versus controls in each breed (Supplementary Tables 1-2). In LR, GSD, and WHWT, a total of eight candidate regions under selection (XP-EHH regions) were identified. Regions were defined using a 1 Mb window scan with 0.1 Mb overlap and at least two variants with -log10(p) XP-EHH above 4 (Fig. 1, Table 2, and Supplementary Fig. 2). We investigated potential functionality of selection variants, i.e., variants with -log10(p) XP-EHH ≥4.0 (N = 1471), by extracting the phyloP29 scores (Supplementary Data 7). We found that 12 selection variants were positioned at constraint sites (phyloP > 2.56; i.e., showing a high level of conservation across 240 mammalian species and thereby a likely functional position) on chromosomes 3, 10, 19, and 32. One variant was exonic and the rest were intronic located within the genes nearest to the top selection variants per region (Table 2 and Supplementary Table 6). The variant with the highest phyloP score of 7.0 was exonic in LRP1B on chromosome 19 and is a missense variant XM_038565111.1:p.(Tyr42His; SnpEff v 4.3.t30) that exists in multiple dog breeds31. The putative impact of the exonic variant in LRP1B predicted in SnpEff was moderate and SIFT32,33 predicted the substitution at amino acid position 42 to be tolerated with a score of 0.22 (SIFT score ranges from 0 to 1 and the amino acid substitution is predicted as damaging if the score is ≤0.05, and tolerated if the score is >0.05).

### Selection signal in Labrador retriever cases targets the TBC1D1 gene

A population substructure was discernible in the relationship matrix of LR (Fig. 4a and Supplementary Fig. 1a) and by utilizing information from Swedish LR kennels, questionnaires from UK, and coat color information from all LR, we concluded that PC1 likely captured a breed type division caused by selection for a gundog versus a common type LR. Gundogs were more often found in the low PC1 cluster, subsequently referred to as the gundog type, and the cluster with high PC1 values was considered as the common type (Supplementary Fig. 3). The 115 selection variants on chromosome 3 were positioned across the TBC1D1 gene and the top selection variant, chr3:74,218,744 (chr3:sel), was intronic to TBC1D1 (Fig. 4b). The allele C at chr3:sel was more frequent in the common type (Fig. 4a). TBC1D1 is known for its association with body weight in humans34,35, pigs36, mice37, rabbits38, and chickens39. A stockier body is typically observed in common type LR, whereas the gundog is generally thinner, as illustrated in Fig. 4a. From the extended haplotype homozygosity (EHH) plot, we observed a higher integrated EHH (iHH; corresponding to the average haplotype length) for allele C at chr3:sel in cases (618 kb) compared to controls (205 kb; Fig. 4d, e). Along the extended region, estimated from the EHH plot for allele C in cases (Fig. 4d), AD-associated variants were defined using plink association (chi-square allelic test) and logistic regression models (Fig. 4f). LD between the risk alleles at chr3:assocA (also a defined effect variant in BayesR of LR) through assocD and allele C at chr3:sel was pronounced and this haplotype had a frequency of 57.2% in the whole LR population, whereas the frequencies for the remaining nine haplotypes ranged from 1.0–9.9% (Fig. 4g). While selection is likely acting on the chr3:sel locus, the association with canine AD was stronger for the chr3:assocA-chr3:assocB-chr3:assocC risk haplotype CCG; chr3:sel genotype explains 25.1% of the PC1 variance and 4.4% of the AD variance, whereas CCG explains 18.4% of the PC1 variance and 7.6% of AD variance. The CCG frequency was 76% whereas frequencies for the other five haplotypes had a range of 1.5–8.8% (Fig. 4h). Among the 178 AD cases, 129 (72.5%) were homozygous CCG compared to 68 out of 143 (47.6%) controls (Fig. 4i and Supplementary Data 8). When dividing the dogs into subpopulations by setting the cutoff at PC1 = −0.05 (gundogs PC1 < −0.05 and common type PC1 > −0.05), it became clear that a large proportion of common type cases was homozygous CCG and that the CCG frequency was associated with AD in the common type (χ2 = 17.5, p = 2.81 × 10−5, n = 245 dogs; Fig. 6a and Supplementary Data 9).

### Selection signal in German shepherd controls across the LRP1B gene

A division in the GSD breed into two subpopulations can be visualized in the PCA plot. We assigned the subpopulations to working type (PC1 < 0) and show type (PC1 > 0) based on the following information: GSD coming from kennels with a higher proportion of dogs with working merits compared to show merits were more common in the cluster with low PC1 values and vice versa, and GSD with black or gray coat color (typically observed among working type GSD) were almost exclusively present in the low PC1 cluster (Fig. 5a and Supplementary Fig. 4). A signal of selection consisting of 1078 selection variants was detected across the LRP1B gene on chromosome 19 in GSD (Fig. 5b). The top selection variant chr19:44,248,511 (chr19:sel) was located in the first intron of LRP1B (according to canFam4 and hg38) and a higher iHH was defined for allele T in controls (6.47 Mb) compared to cases (3.18 Mb; Fig. 5d, e). The association with AD was strongest around the LRP1B gene but in the logistic regression model, including covariates, the association was lost (Fig. 5f). The allele T at chr19:sel was more frequent in the working type compared to the show type (Fig. 5a) and chr19:sel described ~15.2% of the PC1 variance, explaining the loss of AD association in the logistic regression model when correcting for PC1. The proportion of cases was higher in the show (63.6%) compared to the working type (33.0%), and the AD status explained ~12.5% of the PC1 variance indicating that the risk of AD differs between breed types of GSD, as suggested by us previously12. Homozygous T/T at chr19:sel was common among working type controls and the allele frequency at chr19:sel was associated with AD in the working type (χ2 = 5.21, p = 0.0224, n = 107 dogs; Fig. 6b and Supplementary Data 9).

The remaining XP-EHH regions were located on chromosome 5 (LR), chromosome 20 (GSD), and chromosomes 1, 10, 26, and 32 (WHWT; Supplementary Figs. 57).

### Genes in canine AD loci indicate joint pathways

Using the UCSC browser (canFam4), we extracted 275 gene ID names and 268 transcripts with unassigned gene names in BayesR regions (±1Mb from effect variants), and 140 gene IDs and 130 transcripts with unassigned gene names in XP-EHH regions (Supplementary Data 1011). Using Homo sapiens as the reference in STRING resulted in 193 recognized genes in BayesR regions and 136 genes in XP-EHH regions, whereas the reference Canis lupus familiaris resulted in 252 and 126 genes in BayesR and XP-EHH regions, respectively.

BayesR genes generated 20 significant terms (FDR < 0.05) in STRING (Homo sapiens) (Supplementary Data 12) with the most relevant term being from SMART: integrin alpha (beta-propellor repeats), including four ITGA genes (19 genes in the background count). ITGA1 (chromosome 4) and ITGA4 (chromosome 36) were located 995 kb and 320 kb away, respectively, from the top effect variants in the top two and three associated loci in LR. ITGA10 positions within the ~2 Mb associated locus in LR on chromosome 17, and ITGA9 is located ~500 kb from the top effect variant on chromosome 23 in GR where an ITGA9 intronic variant had effect size 0.000099. BayesR genes in STRING (Canis lupus familiaris) resulted in no significant enrichments. Genes under putative selection in STRING (Homo sapiens) resulted in three terms related to leukemia (Supplementary Table 7). The genes in the leukemia cell line term were AFDN (alias MLLT4, chromosome 1, WHWT), KLF3 (chromosome 3, LR), RPS6, (chromosome 5, LR), and FCER2, MCOLN1, and PRAM1 on chromosome 20 (GSD). Genes under putative selection in STRING (Canis lupus familiaris) resulted in the significant GO Component term Phagocytic vesicle represented by the genes APPL2 (chromosome 10, WHWT), TLR1/TLR6 (chromosome 3, LR), and RAB11A, RAB11B and STXBP2 on chromosome 20 (GSD). Additional significant terms were four STRING cluster terms, represented by genes from one or two regions only (Supplementary Table 8).

Combining genes from BayesR and XP-EHH regions in STRING (Homo sapiens) resulted in 23 significant terms (Supplementary Data 13) with the most relevant term from TISSUES: connective tissue represented by 37 genes (871 background genes). Seven BayesR regions (chromosomes 3, 4, 5, 10, 17, 23, and 34) and five XP-EHH regions (chromosomes 1, 5, 10, 20, and 32) were represented by the genes included in this network (Supplementary Table 9). Genes from BayesR and XP-EHH regions together in STRING (Canis lupus familiaris) resulted in the significant GO Process term: MyD88-dependent toll-like receptor signaling pathway (FDR = 0.004), represented by six genes (11 background genes). The genes from BayesR were IRAK3 (chromosome 10, WHWT), MYD88 (chromosome 23, GR) and TNIP1 (chromosome 4, LR), and the genes TLR1, 6 and 10 were from the XP-EHH region on chromosome 3 in LR. The other significant term was cellular anatomical entity (370 observed genes with 19,037 genes in the background).

In conclusion, canine AD candidate genes in BayesR and XP-EHH regions can be assigned to functions in the epidermis and/or immunity, and multiple genes in three BayesR and five XP-EHH regions were also detected in human GWAS of dermatitis, atopic eczema, eczema, and/or psoriasis (Table 3 and Supplementary Data 1415).

## Discussion

We defined 15 AD-associated loci using BayesR, represented by 54 effect variants across four dog breeds. Our results present overlaps with human AD-associated regions and genes, which indicate that similarities exist also at the genetic level and not only in the clinical presentation and immunologic imbalance. The AD-associated locus on chromosome 17, identified with BayesR in LR, overlaps with associated variants from two human AD meta-GWAS studies6,7 and harbor candidate genes from studies of mastocytosis40 and eczema41. Several genes located close to the effect variants on chromosome 17 encode proteins that are relevant to skin and immunity. For example, BCL9 is a transcriptional coactivator associated with B-cell acute lymphoblastic leukemia42, and is known to enhance transcriptional activity responses to Wnt signaling in both B- and T-cell lines43. Mutations in VPS45 result in defective endosomal intracellular protein trafficking and severely defective neutrophils, which underlies an immunodeficiency syndrome in humans44. A neutrophilic skin infiltration is required for the development of chronic itch, and neutrophil depletion reduced itch-evoked scratching in a mouse model of AD45. Mutations in the ECM1 gene cause lipoid proteinosis, a rare condition characterized by an abnormal skin thickening, suggesting that this protein is important for skin adhesion, epidermal differentiation, and wound healing46. ECM1 also has an important function in promoting M1 macrophage polarization, which is crucial for controlling inflammation and tissue repair in the intestine47. The top effect variant on chromosome 17 in LR resides in a potential regulatory region in between MTMR11 and OTUD7B, overlapping 15 canine ATAC-seq peaks from nine different tissues as well as one GeneHancer promoter/enhancer element that is interacting with several genes in the TAD, OTUD7B being one of them. Another effect variant on chromosome 17, located 95 kb from the top variant and in the same TAD, also overlaps with a potential regulatory region covered by eight ATAC-seq peaks from four different tissues and one GeneHancer element in hg38. OTUD7B acts as a negative regulator of the non-canonical NF-kappa-B pathway and OTUD7B deficiency results in B-cell hyper-responsiveness to antigens48. It also plays a role in T cell homeostasis and normal T cell responses49 and has been associated with eczema in human GWAS41. Several of the variants in LD with the effect variants on chromosome 17 also reside in canine ATAC-seq peaks, a few variants overlap peaks represented in more than 10 datasets, and the broadest overlap for a single variant is found in over 35 datasets representing all individuals and all tissues in the database (interaction indicated with >17 genes in human). Some of these variants are also located within human ENCODE cCREs24,25,26 and/or GeneHancer27 elements, which indicates a potentially conserved regulatory function at these positions between dogs and humans. Also, the PDE4DIP40, OTUD7B41, CIART6, MRPS216, and SEMA6C7 genes in the canine AD-associated locus on chromosome 17 were represented among associated genes from human GWAS of related diseases (Table 3 and Supplementary Data 14).

The highest effect size variant in LR was intergenic and positioned 44 kb from ARL14 and 45 kb from KPNA4 on chromosome 34. ARL14 controls the movement of MHC-II vesicles in human dendritic cells50 whereas KPNA2 is involved in signal-transduction pathways that regulate epidermal proliferation and differentiation51. The effect variant on chromosome 37 in LR was intergenic between SLC4A3 and EPHA4. EphA receptors and their ligands are expressed throughout all layers of the epidermis in human and in the basal layer of mouse epidermis, and are functionally integrated with intercellular adhesion complexes. Ephrin signaling complexes play a crucial role in epidermal cell–cell communication and regulate normal keratinocyte behavior. Alterations in the epidermal ephrin axis have been associated with wound healing defects and inflammatory skin conditions52. The ANO3 gene on chromosome 21 (343 kb from the effect variant) in LR has been associated with eczema in humans53.

One associated locus was defined in GR, consisting of three effect variants located within a ~17.5 kb region on chromosome 23. One effect variant (chr23: 8,186,340) resides in a region of canine open chromatin and overlaps with 16 ATAC-seq peaks in datasets from different tissues and individuals, as well as with a GeneHancer element when lifted to hg38. This GeneHancer element is assigned promoter and enhancer functions and interacts with several genes in the region. When lifted over to canFam4, this variant was located in the first exon of two longer ACVR2B transcripts. ACVR2B belongs to the type II activin receptor class and activin-A has been implicated in several aspects of immunity with fundamental roles in allergic responses and tissue remodeling in human allergic diseases, including allergic asthma and AD54. In mice, activin-A participates in the maintenance of skin homeostasis55.

One associated locus on chromosome 9 was defined in GSD, with the top effect variant located in an intron of ABCA9. Several ABC genes, ATP-binding cassette (ABC) transporters, have been associated with skin disorders like ABCA12 with keratosis pilaris56 and Harlequin ichthyosis57. Furthermore, the transcriptional expression of ABCA9 is induced during monocyte differentiation into macrophages58 and macrophages are known to increase in numbers in acutely and chronically inflamed AD skin59. In concordance with the canine AD-associated PKP2 (plakophilin 2)-locus previously described by us12, the top 16th variant (effect size=6.8×10−5, chr27:16,009,789) in the BayesR analysis of GSD was located 14 kb upstream of the PKP2 gene.

One should note that the cutoff for defining effect variants is somewhat arbitrary and not strictly exact. In three breeds, only one or two associated loci were identified in the BayesR analysis, which indicates a too strict cut-off chosen because multiple risk factors were expected. However, lower effect sizes increase the risk of false positives20. In a study of anterior cruciate ligament rupture in LR, using BayesR, the top 50 effect variants were presented60. That approach in our study would result in additional markers in the already defined associated loci but also identify additional loci harboring variants with lower effect sizes, totaling in 50 associated loci (Supplementary Data 16). As a follow-up study, these loci could be further investigated or increased sample sizes could define additional lower effect variants of relevance to canine AD with higher certainty. A lower mean absolute effect size for each variant is also expected in a denser marker set because BayesR iterates the process of assigning variants to different effect size distributions and variants in high LD are randomly selected. On the other hand, the genomic positions indicated by effect variants are more precise and the risk of missing important regions associated with the trait is decreased in a denser marker set.

Breed type specific selection in GSD was also evident in our data. The selection signal on chromosome 19 was detected in GSD controls of working type and extends across the gene LRP1B, which encodes a 4599 amino acid-long member of the low-density lipoprotein receptor (LDLR) protein family. LRP1B is mainly expressed in brain and endocrine tissues in humans81 and in the brain of mice82. According to the Broad Improved Canine Annotation v1 (canFam3.1), multiple transcripts were found in dog brain and kidney but not in other tissues. LRPB1 has been primarily described as a cancer-driving gene81, for example in multiple myeloma83. It has also been associated with asthma84 and eczema85 in human GWAS (Table 3 and Supplementary Data 15), and with cognitive decline86, Alzheimer’s disease87, and infant cognitive ability88. The gene KYNU, encoding the enzyme kynureninase, is located 774 kb from chr19:sel and is the closest neighboring gene to LRP1B. KYNU presented with elevated expression in psoriatic skin lesions compared to normal skin in humans89. In normal skin, KYNU was primarily expressed in the basal layer of the epidermis whereas in the psoriatic skin, its expression was detected across the whole epidermis and in infiltrating immune cells (e.g., T cells, macrophages, and dendritic cells)89. Induced psoriasis-like symptoms in mice were reduced after the application of KYNU inhibitors, and the knockdown of KYNU significantly inhibited the production of inflammatory cytokines in keratinocyte cell lines; altogether, these observations suggest that KYNU represents a likely therapeutic target in psoriasis89. Another neighboring gene, ARHGAP15, has been associated with eczema41,80 but its functional implications are not directly related to the AD phenotype. Since allele T at chr19:sel was more frequent in controls of working type GSD, a possible scenario could be that specific variants affecting the LRP1B gene has influenced a work-desired trait in this GSD breed type that was selected for, and that additional (hitch-hiking) variants affecting (and potentially inhibiting) KYNU are AD-protective. Both alleles of the top three variants of LR and GSD XP-EHH regions on chromosomes 3 and 19 were present in wolves90, thus these alleles are not unique to the breeds or to dogs in general.

Four integrin alpha genes located in different associated loci in LR and GR were highlighted in the STRING enrichment analysis. Integrins are heterodimeric transmembrane cell adhesion molecules with alpha (α) and beta (β) subunits combined in different dimers with diverse functions, for example in cell surface adhesion and signaling. Integrin alpha-4 subunit (ITGA4) associates with the beta-1 subunit in the integrin α4β1 in leukocytes, or with the beta-7 subunit in the integrin α4β7 present in a subset of memory T cells91. ITGA4 was upregulated in the non-lesional epidermis from horses suffering from insect bite hypersensitivity, an IgE-mediated dermatitis caused by insect bites and has common features with human and canine AD92. ITGA9 mRNA expression was increased in human psoriatic skin93 and overall, ITGA-genes, including the ones identified in the associated loci, have also been associated with human skin cutaneous melanoma94. The term connective tissue was represented by genes from seven BayesR and six XP-EHH regions, and, interestingly, a relationship between AD and autoimmune connective tissue disease including systemic lupus erythematosus, rheumatoid arthritis, and Sjögren’s syndrome, has been described95.

In conclusion, we detected multiple canine AD-associated loci, including one that overlaps with FLG, which is the major genetic risk factor described in human AD, and multiple candidate genes were assigned functions related to the epidermis and/or immunity and some were also detected in human GWAS of related diseases. We correlated within-breed selection with accumulation of AD risk or protective factors. The approaches used in this study have led us to better understand the complex genetics of canine AD in four dog breeds predisposed to this disease and implicate shared genetic causes between dog and human atopic dermatitis.

## Methods

### Labrador retriever; breed types and coat color

It is known that the LR breed has been split into a common type, bred for conformation and pet use, and a gundog type, bred for hunting61,66. We classified these breed types based on information from both Sweden, UK, and Switzerland. For the Swedish LR, we extracted the dog’s kennel name and matched these with the LR breed club’s criteria for LR kennels with breeding goals according to gundog focus. On the Swedish Kennel club (SKK) webpage all LR kennels in Sweden active since the 1970s are listed98, and the Swedish LR breed club list gundog LR kennels (with puppies born since 2006)99. Of the 102 LR in our dataset with Swedish origin, 32 had kennel names included in the list of gundog kennels, 51 were not in this list, thus regarded as kennels of common type LR, and 19 had no specified kennel name or had a kennel name not listed by SKK. By extracting owner questionnaire information from UK LR, we identified seven gundogs, which clustered together with the Swedish gundogs (low PC1), and two show dogs clustering with the Swedish common type (high PC1). The Swiss LR cohort did not include any gundogs, but one police dog (low PC1, clustering with the gundogs) and 11 guide dogs for the blind (high PC1). Generally, the Swiss cohort clustered together with the Swedish/UK common type (high PC1) but also forming a subcluster partly overlapping with the common type, but not with the gundog subpopulation. The common type includes LR used for both dog shows, as pets, and for different kinds of work (guide dogs for the blind, snow avalanche rescue dogs etc. primarily represented by the Swiss LR). We also extracted coat color for the majority of the LR (all except 26) to evaluate if the chocolate coat color also supported the division into a common type versus gundog type of LR. Chocolate-colored LR were only present in the common type subpopulation (Supplementary Fig. 3) and represented 9.5% of the LR in the Swedish cohort, 20% in the Swiss, and 24% in the UK.

### German shepherd; breed types and coat color

The GSD is also bred for either show or working capabilities, resulting in a split between two breed types. The number of working and show merits were extracted for in total 247 Swedish GSD kennels in total, with at least 50 registered offspring. Out of the 219 GSD included in our analysis, 121 had a kennel name, and of these, 30 were from kennels with a lower work proportion (Nworking merits/Nshow merits) <0.5, whereas 91 were from kennels with a higher work proportion ≥0.5, henceforth referred to as show and working type kennels respectively. We also extracted the registered coat color from SKK for 192 GSD. The different colors were subdivided into two color classes; (1) gray or black, including all dogs with the colors gray, dark gray, black with gray markings, and black, and (2) brown and black, which included dogs of black or gray color with brown, yellow or red markings. The most common colors were black with brown markings (N = 102), gray (N = 37), and black with yellow markings (N = 19). The remaining colors were assigned to seven dogs or fewer. The gray or black color class was almost exclusively present in the low PC1 subpopulation and GSDs from kennels with high working proportions were more common in the low PC1 subpopulation (Supplementary Fig. 4). It is generally known that working type GSD more often are of gray/black colors compared to the show type. Based on these two levels of support, we concluded that the split across PC1 is most likely explained by a breed type division into working and show type GSD.

### Sample collection and genotyping

We retrieved genotype data from the Illumina CanineHD 170 K BeadChip genotyping array (Illumina, San Diego, CA) generated from blood samples from dogs collected from privately owned dogs in collaboration with several veterinary clinics throughout Sweden (LR, GR, GSD, and WHWT), US (WHWT), and Switzerland (LR and GR). The Swiss cohort included samples from dogs collected in Switzerland, Netherlands, Finland, Germany, and France. Dogs were recruited to the project as their owners visited the veterinary clinic to seek health care for AD (cases) or unrelated problems (controls), or were recruited as healthy controls followed by a visit to the veterinary clinic to leave blood samples. This applied to all countries except the UK. Saliva samples from the UK (LR and GR) were collected by owners of the dogs and posted to the research team as part of the questionnaire study64, and genotyped by Neogen using the Illumina CanineHD 230 K BeadChip (Illumina, San Diego, CA). Samples for each cohort were collected strictly according to regulations defined by each country. Ethical approval for the UK project was provided by the University of Nottingham School of Veterinary Medicine and Science Committee for Animal Research and Ethics. Protocols for US dogs were approved by the North Carolina State University Institutional Animal Care and Use Committee. Collection of the samples and clinical data from Swiss dogs was approved by the Cantonal Committee for Animal Experiments (Canton of Bern; permits 22/07 and 23/10) and from the Swedish dogs by ethical permit C12/15.

The CanineHD 230 K BeadChip is an extension of the CanineHD 170 K BeadChip and we started with a merged genotype dataset consisting of 167,211 SNPs and 1152 dogs from four dog breeds. Genomic coordinates refer to the canFam3.1 genome assembly unless otherwise specified. We used plink (v. 1.90b4.9)100 and R (v. 4.9.2)101 with the following R-packages GENESIS (v. 2.24.0)102, GWASTools (v. 1.40.0)103, and SNPRelate (v. 1.28.0)104 to analyze the genotyped datasets separated by breed. QC was performed per dog breed (plink --geno 0.05 --mind 0.05 --maf 0.05). Genetic relationship was estimated using the KING method of moment for the identity-by-descent analysis105 in SNPRelate. Individuals with a kinship coefficient above 0.177 (~2nd degree relatedness) were removed to generate a dataset with highly related dogs excluded. PCs were estimated using pcair (part of the GENESIS R-package) with the following settings for snpgdsLDpruning: method = r, ld.threshold = 0.7, slide.max.bp = 250000, maf = 0.05, missing.rate = 0.05, and for pcairPartition and pcair: kin.thresh = 0.125, div.thresh = −0.125. In pcair, the PCs were estimated in a subset of individuals unrelated at the kinship coefficient threshold of 0.125, after that step PCs were projected on the more related individuals (i.e., 0.125–0.177). This was to avoid bias due to relatedness in the PC estimation. The individual filtering and PC estimations was performed on the original genotyped datasets, and the resulting sample set and PCs were used for downstream analyses using imputed markers. R-package qqman (v. 0.1.8)106 and Adobe Illustrator (v. 26.0.2) were used for plots and final editing of Figures.

### Imputation

We imputed the genotyped dataset using a reference panel of purebred dogs (435 individuals) extracted from a publicly available dataset including wolf and other canids90. Imputed datasets have better genome coverage, which in the association study gives an improved sensitivity and precision when detecting candidate regions, and increases the likelihood that important regions are not missed. One problem with imputation may be that unique haplotypes are not covered by the reference panel; however, the risk is small given the extensive reference panel with our studied breeds included and all genotyped markers are still included. Quality parameters used in plink prior to imputation were --maf 0.0001, --geno 0.05, and --mind 0.05 and the dataset before imputation consisted of 1152 dogs (347 LR, 294 GR, 231 GSD, and 280 WHWT) and 148,889 SNPs. Imputation was performed as follows: (i) the data was split into each chromosome (except for chromosome 1, which was split into two parts) in plink while filtering on --maf 0.001 and --geno 0.2. (ii) we used SHAPEIT2 (r904)107 (--check) to check if SNP genotype data existed in the reference panel and SNPs not found were excluded in the next step (N = 9529). (iii) the genotype data was pre-phased with the reference panel using SHAPEIT2 (with settings effective-size 500, details on the Markov chain Monte Carlo iterations were --burn 10 --prune 10 --main 50, and threads -T 5). (iiii) we used IMPUTE2 (v. 2.3.2)108 to impute the genotype data (--Ne 500). We used SHAPEIT2 to check SNPs on chromosome X (after using plink --split-x 6,600,000 123,798,852109, --maf 0.001 and --geno 0.05 for chromosome X specifically) and identified problematic dogs with high rates of heterozygosity on chromosome X ( > 1%). These dogs were removed when merging all chromosomes (chromosome X not included) after imputation ending up with 336 LR, 287 GR, 229 GSD, and 275 WHWT.

### Dataset quality and details

After imputation, the dataset was split into each dog breed and analyzed breed-wise (QC: plink --geno 0.02 --mind 0.05 --maf 0.05). We used plink to LD-prune the imputed datasets (--indep-pairwise 25 5 0.999) followed by adding all genotyped SNPs that were excluded in the pruning step. A second step of LD-pruning was performed in GR and LR using a stricter threshold (--indep-pairwise 50 5 0.99) since the total number of variants from the first step of LD-pruning exceeded fastPHASE (v. 1.4.8)110 capacity (Supplementary Table 2).

### Imputation validation

IMPUTE2 automatically produces a concordance table of the internal cross-validation. The program masks genotypes of one variant at a time and imputes the masked genotypes, and compares imputed genotypes with the original genotypes. The provided concordance rate for each chromosome after imputation ranged from 96.2% to 98.8%. As an additional validation of the imputation quality, we randomly masked 5,000 SNPs in the genotyped dataset (3.36% of the total SNP set). Imputation was performed as described above, and out of the masked SNPs, 3,509 SNPs were imputed (70.2%) with a concordance rate of 99.5% across all four breeds, including chromosome X (plink --merge-mode 7). Excluding chromosome X left 3,408 SNPs for validation resulting in a concordance rate of 99.5% in all breeds together. Extraction of data per breed and filtering on --maf 0.05 resulted in concordance rate in LR: 99.4% (out of 2,453 imputed SNPs), GR: 99.6% (2,184 SNPs), GSD: 99.6% (2,036 SNPs), and WHWT: 99.7% (2,025 SNPs). If no maf filter was applied, the concordance rate ranged from 99.4% to 99.6% across the breeds.

### Bayesian mixture model

We used the BMM BayesR (v. 1, update 01/04/2021)20,21 to perform a GWAS in each breed separately. The BayesR algorithm estimates the probability that a variant’s effect size belongs to either of the following four normal distributions: N(0, 0$${{{{{{\mathbf{\sigma}}}}}}}$$2g) i.e. zero-effect, N(0, 0.0001$${{{{{{\mathbf{\sigma}}}}}}}$$2g), N(0, 0.001$${{{{{{\mathbf{\sigma}}}}}}}$$2g), or N(0, 0.01$${{{{{{\mathbf{\sigma}}}}}}}$$2g). The proportions of variants belonging to each distribution are updated in each iteration. The model was run with 300,000 iterations and 100,000 burn-ins to achieve optimal convergence and was also repeated five times. The absolute value of the average effect size per variant was reported as the final result. Fixed effects were the first two (LR and WHWT) or three (GR and GSD) PCs (defined by fitNullModel in GENESIS to have significant (p < 0.05) effect on the trait). For GSD, the -log10 of IgA levels and -log10 of age in years at sampling were included as fixed effects in line with the described relationship between AD and low serum IgA levels in GSD12. To determine a rational cutoff for defining effect variants throughout all four populations of breeds, we chose the value of 0.0001 from the lowest effect size distribution and applied this for all breeds to generate comparable results. Therefore, we regarded variants with mean absolute effect size larger or equal to 1.00×10−3 as effect variants. Effect variants separated by >1 Mb were considered to belong to separate associated loci and the effect variant with the highest absolute effect size for each locus was extracted to represent the associated locus. A risk index was calculated by quantifying the number of risk genotypes from each associated locus (0 = no risk allele, 0.5 = one risk allele, 1 = two risk alleles).

### Characterization of canine AD regions

Genes in candidate regions shown in Figs. 35 were extracted from canFam3.1 public track hub Broad Improved Canine Annotation v1114. For better visualization, the longest transcript per gene was kept and transcripts named ENSCAFG or CFRNASEQ_PROT (lacking official gene symbol nomenclature) were removed. For main tables, gene transcript information was extracted from the canFam3.1 genome assembly but we also used the UU_Cfam_GSD_1.0/canFam431 (canFam4) annotation to provide additional information and update transcript information.

To investigate more distant potential candidate genes in AD-associated loci, we extracted protein coding genes located within 1 Mb (the approximate size of a TAD115) from effect variants of each associated locus and denoted these BayesR regions. For selection, genes within XP-EHH regions were extracted. STRING (v. 11.5)116 was used to evaluate gene set enrichment and potential interactions across loci. Both Homo sapiens and Canis lupus familiaris were used as background models for evaluating the genes in BayesR regions, XP-EHH regions, and in both sets combined. Enrichment terms with one region represented were regarded as non-relevant for evaluation of enrichment across loci. Specific terms with many regions represented and with relevance to canine AD were highlighted. In addition, all genes located in BayesR regions (+/−1Mb from effect variants) and in XP-EHH regions (Supplementary Data 10-11), were compared to associated genes from human GWAS of dermatitis, atopic eczema, eczema and psoriasis from the GWAS catalog117 in order to detect gene overlaps between human skin disorders and canine AD.

The phyloP score is the log p-value under a null hypothesis of neutral evolution, and a positive score indicates evolutionary conservation where positions in the genome remain the same across many species because they are functional. In contrast, negative phyloP scores indicate accelerated evolution, potentially corresponding to positive selection. Genomic positions with phyloP scores >2.56 were considered evolutionary constrained at FDR < 5% (240 species29). We considered the phyloP scores for the variants defined as extreme markers (-log10(p) ≥4.0) in candidate regions of selection, for all effect variants, and variants in LD with effect variants in the chromosome 17 locus. The effect variants were intersected with BarkBase ATAC-seq data23, and, for variants lifted to hg38, ENCODE cCREs24,25,26 (from UCSC Genome Browser) and GeneHancer27 elements. LiftOver118 between genomes (canFam3.1, canFam4, and human hg38) was used to evaluate and compare functional and non-functional positions across assemblies.

### Oxford Nanopore Technologies whole genome sequencing

Two LR AD cases (ID1 and ID2) and two LR controls (ID3 and ID4), heterozygous risk and homozygous non-risk for chromosome 17 effect variants respectively, were chosen for ONT long-read sequencing. DNA was extracted from EDTA blood samples from the four dogs using the NucleoSpin Blood kit (ref 740951, Macherey-Nagel) following the standard protocol, with the exception that DNA was eluted in 50 µl H2O instead of Buffer BE, followed by incubation at RT for 3 min and centrifugation for 1 min at 11,000xg. The elution step was repeated once. DNA concentration was checked by Qubit (Invitrogen, Thermo Fisher Scientific), and DNA size and integrity was assayed using a Genomic DNA Screen Tape (Agilent). DNA was fragmented with g-TUBEs (Covaris) resulting in an average fragment size of 6 kb. The fragmented DNA was prepared for sequencing using the MinION SQK-LSK109 kit (ONT) following the protocol except for two minor differences. For ID1 and ID3, the AMX-F adapter mix was used, whereas for ID2 and ID4 the AMX adapter mix was used. The DNA library for sequencing was loaded and run on four separate R9.4.1 SpotON flow cells (ONT). The.fast5 files were base-called with the Super accurate model in Guppy (v. 6.0.1) (ONT). FASTQ files were mapped to canFam4 with minimap2 -x map-ont119,120. Variants were called using clair3121,122 and SVs were called with Sniffles (v.2.0.3)123 using phased BAMs and the --phase command, --tandem-repeats to define repeat regions in the reference genome, and --reference canFam4. SVs were analyzed in windows of 11 bp with start position (indicated by Sniffles) in the middle of the window (because the exact start position could vary with a few bp between samples) and end position (indicated by Sniffles) was reported as the exact position. The windows in all four samples were intersected with bedtools intersect124.

Variants on chromosome 17 following the same pattern as the effect variants on chromosome 17 (heterozygous cases and homozygous non-risk controls) were extracted and evaluated in the non-LD pruned imputed dataset. All variants in LD (r2 > 0.8) with any of the effect variants on chromosome 17 were extracted for further evaluation. Variants not included in imputed data were also extracted and defined as novel. Two effect variants were excluded based on evaluation of sequence data (Supplementary Note 1 and Supplementary Figs. 8 and 9). The four sequenced dogs were also evaluated individually using read-based haplotype phasing focusing around the chromosome 17 effect variants. Phasing was performed using WhatsHap (v. 1.2.1)125 within the clair3 pipeline and haplotypes were reconstructed with the bcftools126 (--consensus command). In IGV127, the phased BAM files were tagged by and sorted on haplotype. Genotype and haplotype assignment for the effect variants on chromosome 17 in LR were identified in the phased BAMs for all four individuals. Potential functionalities of variants were evaluated by SnpEff (v 4.3.t)30 (canFam4 and a custom built database with the NCBI annotation of the reference), phyloP29, BarkBase ATAC-seq data23, CpG annotation in camFam4, and, for variants lifted to hg38, ENCODE cCREs (from UCSC Genome Browser) and GeneHancer27 elements. LiftOver118 between genomes (canFam3.1, canFam4, and human hg38) was used to evaluate and compare functional and non-functional positions across assemblies.

To extract regions of low heterozygosity (i.e., regions where phasing failed), the phased reads for the two controls and the two cases in region canFam4 chr17: 55,000,000–63,000,000 were filtered on assigned haplotype tag HP1 and HP2 using bamtools128 filter. Reads without HP tag common for both controls or both cases were extracted using bedtools124 intersect and merged in regions if overlap within 1 kb start/end with bedtools merge. Regions unique to the controls were extracted with bedtools intersect.

### Statistics and reproducibility

Comparisons between case and control populations for risk index were performed using Welch Two Sample T test (two-tailed) and boxplot (R package stats v. 4.1.2101 and graphics v. 4.1.2129). Differences in allele frequencies between cases and controls within each breed type of LR and GSD was calculated using Pearson’s Chi-squared test with Yates’ continuity correction (R package stats v. 4.1.2). We calculated phenotypic variance explained by AD-associated loci using a linear model and ANOVA (R package stats v. 4.1.2), and included PC1-2 in the analysis of LR and WHWT, PC1-3 in GR, and PC1-3, -log10(IgA), and -log10(Age) in GSD (same as in BayesR). The total sample sizes for BayesR and XP-EHH analyses were 321 LR, 256 GR, 219 GSD, and 235 WHWT, and the same sample sizes were used in the additional statistical tests except for the chi-squared test of breed types of LR and GSD. The total sample sizes for breed types were; 245 common type LR, 76 gundog type LR, 110 show type GSD, and 107 working type GSD.

### Reporting summary

Further information on experimental design is available in the Nature Portfolio Reporting Summary linked to this Article.