## Introduction

Human campylobacteriosis exerts profound societal and economic costs. The World Health Organisation estimated that Campylobacter caused 95 million illnesses, 21,000 deaths and loss of 2.1 million disability-adjusted life years globally in 20101. In the United Kingdom alone, there were 63,946 laboratory-confirmed human cases in 2017, most of which were due to C. jejuni. The ratio of unreported cases of human campylobacteriosis to those captured by national surveillance in the United Kingdom is estimated to be 9.3 to 12, therefore the true burden may exceed half a million cases per annum at a projected median cost to the national health service of £50 million3. Human infections typically involve acute gastroenteritis, however debilitating sequelae may occur including Guillain–Barré syndrome (GBS) and other inflammatory neuropathies4. Poultry are the main reservoir of human campylobacteriosis and up to 80% of human cases may be attributable to the avian reservoir as a whole5,6. During 2016–2017, a United Kingdom-wide survey indicated that 54% of fresh retail chicken was contaminated with viable Campylobacter7. The number of C. jejuni in the caeca of chickens frequently exceeds 108 colony-forming units/gram and carcass contamination at slaughter can be difficult to avoid8,9. Quantitative risk assessments predict that even a relatively modest 2 log10 reduction in the number of Campylobacter on broiler carcasses could reduce the incidence of human disease due to contaminated chicken by up to 30-fold10. A pressing need therefore exists for strategies to reduce the entry of Campylobacter-contaminated poultry meat into the food chain.

As effective vaccines and treatments for pre-slaughter control of Campylobacter in poultry are lacking, much interest exists in the potential for breeding chickens with improved resistance to intestinal colonisation by C. jejuni. The scope for genetic control of Campylobacter in poultry has been demonstrated in commercial broiler lines that vary in resistance11,12,13,14,15,16,17 and inbred layer lines exhibiting heritable differences in C. jejuni colonisation following experimental inoculation18,19. Breeding for Campylobacter resistance may also benefit avian intestinal health and productivity. In some commercial broiler chicken lines, experimental inoculation of C. jejuni elicits damage to the intestinal mucosa, diarrhoea and impaired weight gain20,21. In some birds, Campylobacter colonisation may impact intestinal barrier function22, nutrient absorption and transporter expression21,23, the translocation of Escherichia coli to extra-intestinal organs24, and may lead to a dysbacteriosis. In other studies, where Campylobacter inoculation was via natural exposure mimicking field exposure, there was no association between Campylobacter levels and bird growth rate25 or gut pathology26, and selection for Campylobacter resistance remained compatible with other breeding goals26.

A previously published genome-wide association study (GWAS) on C. jejuni intestinal colonisation, where phenotypes were analysed as a binary trait in a novel dual-purpose chicken breed, revealed a resistance-associated locus on chromosome 11 near the CDH13 gene encoding T-cadherin, and a second candidate locus on chromosome 5 was identified close to calmodulin (CALM1), a calcium-activated modulator of cadherin function17. Both genes differed in relative expression in a manner associated with resistance17. Studies in inbred layer lines 61 and N found heritable differences in caecal colonisation by various C. jejuni strains18. Initial low-resolution mapping studies using reciprocal (61♀ × N♂) and (N♀ × 61♂) F1 crosses and the progeny of an (N♀ × 61♂) F1♂ x N♀ backcross indicated that resistance was controlled by a single autosomal dominant locus18, but subsequent analysis of a backcross population using 1243 fully-informative single nucleotide polymorphism (SNP) markers revealed quantitative trait loci (QTL) on chromosomes 7, 11 and 1419. Using a ninth-generation advanced intercross (61xN) line and a 600K genome-wide SNP array, the location of the QTL on chromosome 14 was confirmed and refined, and additional QTLs were identified on chromosomes 4 and 16, indicating potential involvement of the Major Histocompatibility Complex (MHC) region19. QTL for resistance of chickens to enteric carriage of Salmonella have been reported at the same regions of chromosome 1427 and 1627,28,29 that were implicated in host resistance to C. jejuni colonisation. Analysis of caecal gene expression in chicken lines of varying susceptibility to Campylobacter colonisation has identified transcriptional signatures associated with differential resistance, including genes influencing the immune response13,14,15,16.

In the present study, we conducted a comprehensive genome-wide association study on a commercial pedigree broiler population exposed to Campylobacter naturally present in the litter. The genomic architecture of resistance was analysed using imputed high-density SNP genotyping, and resistance was further characterised by transcriptome analyses of intestinal tissue from birds of the predicted resistant or susceptible genotypes at the corresponding extremes of caecal Campylobacter colonisation phenotype. Our data provide valuable insights into the prospects for genetic control of Campylobacter in poultry.

## Results

### Descriptive statistics and genetic parameters affecting Campylobacter levels

In the 3000 commercial broilers sampled, we estimated a mean number of caecal C. jejuni ± standard deviation of 7.057 ± 1.023 log10 colony-forming units (CFU) per gram (g), with a maximum of 10.64 log10 CFU/g and minimum of 2 log10 CFU/g, equal to the limit of detection by direct plating. We were unable to detect Campylobacter by direct plating for just 10 out of 3000 caecal samples (0.003%). Sex had a marginal effect on Campylobacter levels (P < 0.05), with males (n = 2141) having a higher Campylobacter load (7.178 ± 0.034 log10 CFU/g) compared to females (n = 859; 6.930 ± 0.032 log10 CFU/g). Campylobacter levels showed seasonal variability, with date of sampling having a significant effect (P < 0.05), while body weight did not have a significant effect on the trait. No significant maternal effects were identified. Modest, but statistically significant heritability for caecal Campylobacter level was estimated (h2 = 0.11 ± 0.03).

### Genetic association analyses

From 50K SNP genotype data obtained for 3000 broilers, genotypes were imputed to 600K SNP for 2718 birds. 282 samples failed the pedigree integrity testing prior to imputation and were removed. Imputation was not possible on chromosome 16 due to the complexity of the MHC region. These data were analysed using two genome-wide association methodologies: a genome-wide association study (GWAS) for single SNPs and regional heritability mapping (RHM) to consider associations with genomic regions comprising multiple consecutive SNPs. Using the SNP data on chromosome 16 the MHC haplotypes of these birds were assembled and a haplotype analysis was performed.

#### Genome-wide association study

Manhattan and quantile–quantile (Q–Q) plots for the GWAS results using the 50K and the imputed 600K arrays are shown in Fig. 1A,B, respectively. SNPs significantly associated with log-transformed caecal Campylobacter levels, after a Bonferroni correction for multiple testing, are shown in Table 1. Multi-dimensional scaling analysis revealed no population substructure in these commercial broilers (Supplementary Fig. S1). GWAS analysis using the 50K SNP DNA array identified one SNP on chromosome 16 significantly associated with the log-transformed number of Campylobacter in the caeca at the genome-wide level and another ca. 100 SNPs on the same chromosome with a suggestive genome-wide significant association with the trait. All the significant SNPs were in high linkage disequilibrium (LD) and located within the same LD block (length 230 kb), in the same MHC region (Supplementary Fig. S2). Three MHC haplotypes were constructed. The recombination events were limited with only one event identified in the TRIM region of MHC of one sample. One of the MHC haplotypes (AA) was detected at much higher frequency (88%; n = 2645), compared to the other two (BB = 0.4%; n = 12 and AB = 11.6%; n = 343). The ensuing MHC haplotype analysis identified statistical significant (P < 0.05) associations between the MHC haplotypes and the log-transformed number of Campylobacter in the caeca. The prevalent MHC haplotype AA was associated with colonisation resistance accounting for 1 log10 difference in the Campylobacter levels compared to the susceptible BB haplotype; a significant favourable dominance deviation was also observed for the heterozygous haplotype AB (Supplementary Fig. S3).

Additionally, one SNP with a genome-wide significant association and another one, located 1.4 Mb away, with a suggestive association were identified on chromosome 26. Similarly, two SNPs identified on chromosome 14, albeit 7.5 Mb apart from each other, crossed the suggestive genome-wide significant threshold. The significant SNPs on chromosomes 14 and 26 were not in LD (on chromosome 14: r2 = 0 between AX-75778268 and AX-75798135 markers; on chromosome 26: r2 = 0 between AX-76346701, AX-76339934 markers). The GWAS analysis using the imputed 600K SNP array information confirmed the associations on chromosomes 14 and 26.

The additive and dominance genetic effects, and the proportion of the total phenotypic variance explained by each of these SNPs are summarised in Table 1. The SNPs on chromosome 14 and 16 had a significant additive effect (ranging from 0.3 to 0.6 log10 CFU/g), while the SNPs on chromosome 26 had a significant dominant effect (ranging from 0.3 to 0.4 log10 CFU/g) on Campylobacter levels. The significant SNPs in the QTL region on chromosome 16 accounted for 6% of the phenotypic variance, while collectively all the SNPs in the three candidate regions accounted for 17% of the phenotypic variance of caecal Campylobacter levels.

#### Regional heritability mapping

A Manhattan plot and Q–Q plot for the RHM analysis are shown in Supplementary Fig. S4. Details of the significant SNP windows are presented in Supplementary Table S1. RHM confirmed the significant associations on chromosome 16 previously identified by the GWAS. Moreover, RHM detected one more suggestive significant association on chromosome 19.

### SNP and candidate region annotation

Most of the significant SNPs identified in the GWAS analysis were located upstream (40%) or downstream (34%) of predicted genes or within introns (19%). However, eight of the SNPs on chromosome 16, one SNP on chromosome 26 and one SNP located within the significant RHM SNP-window on chromosome 19 were found within exons. The exonic variants on chromosome 16 were located within TRIM10 (tripartite motif 10), TAP1 (transporter associated with antigen processing 1), RACK1/GNB2L1 (receptor for activated C kinase 1), TRIM27 (tripartite motif 27), and TRIM32 (tripartite motif 32). The SNP on chromosome 19 corresponds to a synonymous variant within HIP1 (huntingtin interacting protein 1), and the SNP on chromosome 26 corresponds to a missense variance within ADORA3 (adenosine receptor 3). Details of the SNP annotation are presented in Supplementary Table S2.

The candidate QTL regions for caecal Campylobacter levels contained a relatively small number of genes, collectively comprising 173 annotated protein-coding genes, 7 microRNAs and 2 snoRNAs. Details of the genes and non-coding RNAs located in the candidate regions are presented in Supplementary Table S3.

### Transcriptomic analyses

As many traits are associated with altered expression of genes within QTLs30, we performed RNA-Seq analysis of the caecal tonsil transcriptome of 23 broilers to identify expression QTLs (eQTLs) and potential allelic imbalance of candidate genes within the regions associated with Campylobacter resistance identified by GWAS and RHM analyses. The birds used were selected to represent combinations of predicted resistant or susceptible genotypes (based on the identified QTLs) that were correspondingly at the extremes of caecal C. jejuni colonisation phenotype (resistant n = 9, susceptible n = 7), as well as the average (n = 7) of measured caecal Campylobacter load (Supplementary Table S4).

#### Differential gene expression analysis

After false discovery rate (FDR, P < 0.05) correction for multiple testing, 3 protein-coding genes were found to exhibit significant differential expression (Table 2). The three differentially expressed (DE) genes were located within the QTL region on chromosome 16 (BF2, ENSGALG00000032221, ENSGALG00000024357). The BF2 gene is an MHC class I gene while the other two are both BG-like genes belonging to the butyrophilin family. In order to identify more subtle patterns of differential expression, a relaxed significance threshold of unadjusted P value of 0.001 was implemented. A total of 33 genes exhibited differential expression between high-, average-, and low-colonised birds at this threshold (Table 2; Fig. 2). Among these DE genes were several related with the immune response (ILF2, ATG7, BG1, BF2, BF1, TAP1, ZNF692). Interestingly, there were three DE BF2 transcripts, two of which were downregulated (ENSGALT00000079478, ENSGALT00000077683) and the other upregulated (ENSGALT00000087837) in the resistant birds. There were also two DE BF1 transcripts, both of which were upregulated in the resistant birds.

We performed separate cis- and trans- based eQTL analyses for the significant and suggestive significant SNP markers identified by the GWAS and RHM:

#### Cis-analysis

After false discovery rate (FDR, P < 0.05) correction for multiple testing, we detected 102 significant cis-eQTL (Supplementary Table S5). Of those, 90 were associations between SNPs in high LD, located in the same QTL region on chromosome 16, and the expression of a single gene, BG-like antigen 1 (BG1) (Fig. 3A). This eQTL had a log10 allelic-fold-change of 2.03. Four more cis-eQTLs were identified for ENSGALG00000032221 and three novel gene transcripts on chromosome 16 (ENSGALT00000065054, ENSGALT00000049453 and ENSGALT00000085167). Another three significant cis-eQTLs were detected within TMEM11 (transmembrane protein 11) and the COPS3 (COP9 signalosome subunit 3), two genes located within the QTL region on chromosome 14 (position 4,552,835–4,560,698 and 4,767,396–4,781,731, respectively) (Fig. 3B,C, Supplementary Table S5).

#### Trans-analysis

We detected a total of 13 significant trans-eQTLs within the QTL region on chromosome 19 and two on chromosome 26 (Supplementary Table S6). Most of these predicted trans-acting elements are for genes related with metabolic processes. The trans-acting SNP on chromosome 26 is for a microRNA (gga-mir-1553) located on chromosome 7, close to the peak of a previously identified QTL for C. jejuni resistance using a back-cross population of inbred lines 61 and N19.

#### Allele-specific expression analysis

If an individual is heterozygous for a cis-acting SNP it is expected that the two alleles of the gene will be expressed unequally causing allelic expression imbalance. To verify the cis-QTLs detected above, and identify additional ones, we identified genetic variation within the QTL regions identified by the GWAS and RHM using the RNA-Seq data and performed allele-specific expression (ASE) analysis for all the SNPs located within these regions. Several significant ASE events were identified in all QTL regions (mean P value ≤ 0.05 with at least 4 heterozygous animals). Specifically, 14 significant ASE events were identified for 3 genes located on chromosome 14, 30 for 6 genes on chromosome 16, 11 for 2 genes and one microRNA (gga-mir-142) on chromosome 19, and 35 for 5 genes on chromosome 26 (Supplementary Table S7). A highly significant ASE event was identified on chromosome 14 (for the QTL located at 12 MB) for chloride voltage-gated ion channel 7 (CLCN7). ASE results on chromosome 16 were consistent with the presence of a cis-acting polymorphism in BG1, with 6 SNPs in high-LD showing allelic imbalance (P < 10–13). Moreover, ASE analysis highlighted potential cis-acting polymorphisms for other genes of interest in the region, namely MHC class 1 (BF1 and BF2) and class 2 (BLB2 and BLB1) (Fig. 4A–C). Within BF2, 13 different SNPs showed highly significant ASE, with P values < 10–305 (Fig. 4A). Chromosomes 19 and 26 also contain several immune-related genes showing significant ASE: angiopoietin-related protein 2 (ANGPTL2), C–C motif chemokine ligand 4 (CCL4), complement C3b/C4b receptor 1-like (CR1L), C4b-binding protein (C4BP), polymeric immunoglobulin receptor (PLGR) and BCL2 antagonist/killer 1 (BAK2).

#### Validation of selected differentially transcribed genes

Specific qRT-PCR assays were devised to validate the transcript levels measured by RNA-Seq using the same RNA samples. Four genes located in the MHC region on chromosome 16 (BF1, BF2, ENSGALG00000024357 and ENSGALG00000032221), were found to be differentially transcribed in the caecal tonsils of birds of predicted resistant or susceptible genotypes with divergent caecal Campylobacter load, after adjusting for sex and seasonality. This confirmed that the expression of each gene differed significantly between resistant and susceptible birds (P ≤ 0.05 after Tukey’s HSD post-hoc test adjustment). The correlation estimates (r2) between the quantified by qRT-PCR and RNA-Seq expression levels of BF1, BF2, ENSGALG00000024357 and ENSGALG00000032221 were 0.85, 0.78, 0.83 and 0.71, respectively (Supplementary Fig. S5).

### Pathway, network and functional enrichment analyses

#### Pathway analysis using encoded genes in the candidate regions for Campylobacter resistance

Based upon the significant heritability estimate and the large amount of genetic variance accounted for by the identified SNPs, we reasoned that the corresponding QTL regions may contain genes contributing to common pathways associated with resistance to Campylobacter colonisation. We therefore identified the sets of annotated genes lying within QTL regions and sought evidence of gene set enrichment. Ingenuity Pathway Analysis (IPA) (https://digitalinsights.qiagen.com/products-overview/discovery-insights-portfolio/analysis-and-visualization/qiagen-ipa) found these genes to be enriched for pathways involved in innate and adaptive immune responses, antigen presentation, inflammatory responses, calcium signalling, renin–angiotensin signalling, epithelial cell signalling and interactions (Fig. 5). Moreover, three networks of molecular interactions related to ‘immunological diseases’, ‘cell death and survival’, and ‘molecular transport and protein trafficking’ were constructed using the list of genes in the candidate regions (Fig. 6). Core genes featured in these networks included immune-related genes (such as NFkB-complex, RAS, ITPR, CCL5, CCL4, interferon alpha and beta), MHC-related genes (such as TAP1, TAP2, TAPBP) and calcium (Ca+) regulated genes. We subsequently extracted the gene ontology terms for each of these genes and performed functional annotation clustering analysis. The genes were organised into 41 clusters, each given an enrichment score (ES). The first (ES = 4) and the second (ES = 3.5) clusters were both enriched for genes functionally annotated as involved in ‘antigen processing and presentation via MHC class I and class II molecules’ (including BF1, BF2, BLB1, BLB2, DMB1, DMB2, TAP1, and TAP2) (Supplementary Table S8).

#### Pathway analysis using DE genes in birds with different Campylobacter colonisation levels

Functional analysis of the DE genes using the IPA software showed significant enrichment for pathways related with immune response (interferon signalling, antigen presentation, immunodeficiency signalling) and metabolism (protein ubiquitination, glutamate degradation). Moreover, one network of molecular interactions related to ‘cell death and survival, and organismal injury and abnormalities’ was constructed based on the DE genes. A core gene in this network was HSP90AB1 which encodes a member of the heat shock protein 90 family. Functional annotation clustering of these genes uncovered significant enrichment (E.S. = 1.2) for one gene cluster related with immune response, defence response, response to stress, symbiosis, encompassing mutualism through symbiosis, interspecies interactions between organisms.

## Discussion

We sought to investigate the genetic basis of resistance of chickens to Campylobacter colonisation and evaluate the potential for selective breeding of poultry with enhanced resistance to control Campylobacter at farm level. Using samples from 3000 commercial chickens exposed to Campylobacter, we detected heritable variation associated with caecal Campylobacter levels and identified genomic markers and regions associated with colonisation that can be used to inform breeding decisions. Candidate genes, cis- and trans- acting elements, canonical pathways and networks, and MHC haplotypes that were implicated in resistance to Campylobacter colonisation were also identified.

We estimated significant heritability (h2 = 0.11) for caecal Campylobacter colonisation. This was lower compared to a previous estimate for this trait using the progeny of crosses of inbred White Leghorn chicken lines with differing resistance to Campylobacter colonisation (h2 = 0.25)19. The difference is likely attributable to the use of field data on naturally colonised broilers in the present study compared to experimental challenge of inbred lines. The heritability of resistance to Campylobacter colonisation is similar to that observed for other livestock pathogens and diseases, such as bovine tuberculosis (h2 = 0.09–0.17)31,32 and bovine and ovine mastitis (h2 = 0.10–0.20)33, where the development of genetic evaluations to guide breeding decisions was deemed to be feasible.

In a previous study of the same commercial chicken population, Campylobacter colonisation levels were not significantly phenotypically and genetically correlated with key production traits such as body weight, nutrient absorption and gut health; this highlights that the presence of Campylobacter in the caeca of chickens was not detrimental to the birds studied and that co-selection for Campylobacter colonisation resistance with other breeding goals is feasible26. However, the low heritability estimates indicate that a large proportion of phenotypic variance in Campylobacter colonisation is determined by non-genetic factors that merit further investigation. Moreover, the high frequency of resistance-associated alleles in the studied population of commercial birds suggests limited scope for improvement in the specific pedigree line.

In the present study, we assumed a uniform exposure of birds to Campylobacter during the 16 months of sampling. A seasonal, batch and sex effect on Campylobacter colonisation was detected and fitted in the GWAS, eQTL and differential expression models of analysis to adjust for these sources of systematic variation. Season has been previously reported to affect the colonisation phenotype in chickens34,35, with this linked to an elevated incidence of human campylobacteriosis during summer36. The basis of this seasonal effect is not entirely clear37. Moreover, while Campylobacter was routinely detected in the environment of the birds sampled, we cannot preclude the possibility that the bacterial species and sequence types present varied over time.

Consistent with a previous report of paternal effects on caecal C. jejuni colonisation in broilers12, we detected a significant effect of sex on the colonisation phenotype, with males having higher mean caecal counts of Campylobacter. Male susceptibility to Campylobacter has been also reported in human and mouse studies38,39. Sex-related differences in immune response and survival rate of broiler chickens have been reported for a range of pathogens in chickens40. Male broilers were found to be more susceptible to infectious disease and this was attributed to a less efficient immune response compared to females40. Moreover, there are differences in gene expression and responsiveness to bacterial lipopolysaccharide between macrophages from males and females that have been attributed to the lack of dosage compensation of the genes on the Z chromosome, which includes the interferon cluster41. Apart from seasonality and sex, other non-genetic factors may explain the observed variation in Campylobacter colonisation, including strain variation42, the time and level of exposure relative to sampling43, coinfections44, variation in the gut microbiota45,46, and diet and feed intake47,48. Our results should be interpreted in the context of the limitations and advantages of field-based genome-wide association studies49,50. Compared to controlled challenge experiments, unknown and uncontrolled exposure to non-genetic factors may reduce the power of a field study but do not constitute a fatal flaw in demonstrating host genetic differences in resistance49. Moreover, the demonstration of heritable resistance in field studies that simulate commercial practice is highly relevant to the production system into which selectively-bred birds would be introduced. Ideally we would challenge age-matched birds of predicted resistant or susceptible genotypes synchronously by oral gavage with defined doses of a range of C. jejuni strains to validate predictions. However, the number of generations between the population used for the genome-wide association study and broiler flocks currently available, together with the low frequency of susceptibility-associated variation, would make it challenging to validate associations between specific genomic regions and caecal C. jejuni colonisation.

In line with our previous findings using the progeny of crosses of inbred chicken lines19, the major histocompatibility complex region on chromosome 16 was implicated in resistance to Campylobacter colonisation in commercial broilers. Using genomic data we were able to identify a strong QTL in the MHC region explaining most of the trait-associated genetic variation, and the QTL overlapped with expression QTLs detected by RNA-Seq analysis of caecal tonsil tissue from birds at the extremes of the colonisation phenotype. Within this QTL region, 100 SNP markers were found in high LD and collectively corresponded to three MHC haplotypes. These haplotypes were relatively stable, since only one recombination event was identified in the TRIM region of MHC, and they were associated with distinct colonisation phenotypes, with the more prevalent one associated with colonisation resistance accounting for 1 log10 difference in the Campylobacter levels.

Despite the MHC region being polymorphic and repetitive, making it challenging to identify causative genes and mutations underlying disease resistance, our analyses revealed a number of candidate genes for Campylobacter resistance that warrant further investigation. Specifically, the eQTL and ASE analyses showed evidence for cis-acting elements related with the expression of the BG-like antigen 1 (BG1) gene, major (BF2) and minor (BF1) MHC class I genes, the major (BLB2) and minor (BLB1) MHC class II beta chain genes. In addition, the BF2, BF1, BG1 and two BG-like genes were found to be differentially expressed in chickens with different caecal Campylobacter levels. The major MHC class II beta chain gene (BLB2) gene is widely expressed at high levels in hematopoietic cells, whereas the minor MHC class II beta chain gene (BLB1) is generally poorly expressed, although highly expressed in spleen, intestinal epithelial cells, and particularly the caecal51. The CIITA transactivator gene that controls expression of MHC class II genes52 has been found to be differentially expressed in chickens with high and low C. jejuni colonisation levels in a previous experimental RNA-Seq study of the caecal tissue16. Previous human studies have associated MHC class II alleles as genetic risk factors for inflammatory neuropathies secondary to C. jejuni infection53. C. jejuni can increase surface expression of MHC class II on murine bone marrow-derived dendritic cells and activate them to induce T cell responses54, however the extent to which this applies in chickens is ill-defined. In that study, MHC class I genes BF1 and BF2 were also found to be differentially expressed in the caeca of chickens with high and low C. jejuni levels16. Furthermore, similar to our findings, MHC-related BG genes have also been reported to be differentially transcribed in the spleen of two chicken lines that differ in susceptibility to C. jejuni colonisation. The BG region of MHC is very repetitive, and it is therefore difficult to distinguish specific BG genes due to copy number variation55. In mammals, gammadelta (γδ) T cells recognize a wide variety of ligands, among members of the butyrophilin family encoded by BG genes, and recent studies suggest that the same may apply in chickens56. In mammals, γδ T cells enriched in the intestinal epithelium and play a key role in control of mucosal immunity and inflammation57. Indeed, human γδ T cells proliferate after exposure to C. jejuni and have been implicated in GBS following C. jejuni infection58. The role of γδ T cells and recognition of BG ligands during Campylobacter infection in poultry therefore merits further study.

In the present study, network analyses detected interferon signalling among the pathways associated with resistance. Interferon-γ (IFN-γ) has been reported to be induced following Campylobacter infection of avian cells59 and chickens in a breed-dependent manner60, and may underlie breed-specific differences in gut inflammation and pathology20,61. Furthermore, multiple interferon-related genes were found to be differentially expressed in the caecal transcriptome of chickens with high and low C. jejuni colonisation levels in a previous RNA-Seq study16. Interestingly, the major class I and II molecules, as well as other MHC related genes have interferon response elements (IREs) in their promoters which respond to IFN-γ and therefore their differential expression may be subject to interferon regulation. Furthermore, our recent analysis of whole genome sequence of commercial broilers and layers, and the transcriptome of isolated macrophages from a broiler-layer F2 sibling backcross, also revealed substantial differences in the expression of interferon-regulatory factors (IRF) family members as well as in the BLB1 and BLB2 genes between individual birds62 that could underlie this phenotype.

The present study identified two distinct QTLs on chromosome 14, both located within the interval of a previously identified QTL for Campylobacter resistance using a backcrossed ([61 × N] x N) population of inbred lines 61 and N19. One of these QTLs (located at 12 Mb) overlaps with a QTL identified for the same trait using a ninth generation advanced intercross population of these lines19, as well as a QTL for resistance to Salmonella colonisation in chickens27, suggesting that a mechanism of resistance common to both pathogens may exist. CREB binding protein (CREBBP), a key immune regulatory protein implicated in Salmonella resistance in chickens63, lies in close proximity to the marker of this QTL. In the present study, a mutation (14:12,556,836, C to T, splice donor variant) with a predicted high impact on the encoded protein of this gene was significantly (P < 0.05) associated with Campylobacter colonisation resistance (data not shown). Furthermore, pathway analysis in the present study confirmed the enrichment for CREB signalling reported previously in inbred lines19. The other QTL region (located close to 5 Mb) on chromosome 14 in the present study overlapped with an expression QTL for Campylobacter resistance. Specifically, the SNP marker significantly associated with Campylobacter resistance was also a cis-acting element for two genes (upregulates COPS3 and downregulates TMEM11). The protein encoded by COP9 signalosome complex subunit 3 gene possesses kinase activity that acts as a site for complex phosphorylation of many regulators involved in signal transduction such as I-kappa-B-alpha, p105, and c-Jun64. This protein is part of a complex that plays a key role in diverse cellular processes including cytokine signalling and antigen induced responses65.

Several immune-related genes in the QTL regions on chromosomes 19 and 26 showed evidence of allele-specific expression. Among these genes were the polymeric immunoglobulin receptor (PLGR) which is highly expressed in intestinal epithelial cells and mucosa, and plays a crucial role in the transcytosis of polymeric soluble immunoglobulins and immune complexes to the gut mucosal surface66. Genes and pathways involved in the immunoglobulin production and function were reported to be upregulated in chickens relative resistance to C. jejuni16. PLGR has been associated with intestinal immune defence against the lumen-dwelling parasite Giardia in mice67. The QTL for Campylobacter resistance identified on chromosome 26 encompasses the calcium/calmodulin-dependent protein kinase IG which belongs to a calcium-triggered signalling cascade and phosphorylates the transcription factor CREB (https://www.uniprot.org/uniprot/Q96NX5). A previous GWAS study of C. jejuni resistance identified a suggestive significant association proximal to the calmodulin gene17. Intracellular calcium levels in the intestinal epithelium are affected by C. jejuni in some lines23, however the extent to which this affects bacterial colonisation, or is induced by it, is unclear.

Our pathway analysis also showed enrichment for other innate and adaptive immune related pathways in association with Campylobacter resistance. Of increased interest is the pathway related with IL-17 signalling since several previous studies suggested that IL-17 signalling and TH-17 responses play a role in resistance to Campylobacter colonisation in chickens following experimental inoculation13,16,19,61,68,69. Future studies could seek to characterise the timing and magnitude of such responses in birds of predicted resistant or susceptible genotypes upon exposure. An important factor to be considered in future studies of this type is to characterise any concurrent infections and the subsequent relationship with Campylobacter and the host immune response. Pathway analysis also detected enrichment for the renin–angiotensin system, components of which have been reported to be upregulated in a chicken line exhibiting relative resistance to Campylobacter infection16 and in the gastric mucosa of Helicobacter pylori infected humans70.

The advent of rapid genome-editing technology, for example reliant on modification of primordial germ cells implanted into sterile recipients during embryo development71, provides a potential means to validate the role of genetic variation in Campylobacter resistance.

## Conclusion

Our comprehensive genomic analyses estimated significant heritability of Campylobacter resistance in a commercial broiler population and identified QTLs, transcripts and networks in common with previous studies. A clear association with the MHC locus on chromosome 16 was identified, including detection of differentially transcribed MHC-related genes in the QTL interval in birds at the extreme of colonisation phenotype. The low frequency of susceptibility-associated alleles in the broiler population studied precluded the selection of predicted resistant or susceptible birds for experimental challenge. The QTLs identified accounted for a c. 2 log10 CFU/g difference in caecal C. jejuni colonisation, sufficient to provide a significant reduction in the predicted number of human cases based on mathematical modelling10. However, resistance-associated variation was already highly prevalent in the population studied and environmental factors, which played a far greater role in the phenotype, may be more amenable to rapid and effective intervention. A multifactorial approach addressing both genetic and non-genetic factors is therefore needed to reduce Campylobacter levels in poultry and the incidence of the human disease attributed to this source.

## Methods

### Animals and sampling

A total of 3000 birds of an outbred pure-bred commercial broiler line from the Aviagen breeding programme were housed within a non-bio-secure environment (referred to as the sib-test environment) that aimed to resemble broader commercial conditions and where full siblings and half siblings of selection candidates are placed72. Birds were fed standard maize-based starter, grower and finisher diets in line with industry practice. All birds throughout the study received the same vaccinations as per commercial regime and were reared under the same management practices and environmental parameters26. Birds were naturally exposed to Campylobacter spp. under these conditions, as confirmed by routine sampling of the environment using the ‘boot sock’ method described previously25. Birds were culled and phenotyped when they reached the age of five weeks. This was performed in batches of 100 birds (50 males and 50 females) giving a total of 3000 birds phenotyped over a period of 16 months. After culling of birds by cervical dislocation by trained personnel, cardiac blood was collected for DNA extraction, the two caeca were collected for enumeration of viable Campylobacter, and the two caecal tonsils were stored in RNAlater (Thermo Fisher Scientific, Waltham, USA) for subsequent RNA extraction.

### Phenotyping and genotyping

To enumerate Campylobacter, serial ten-fold dilution series of weighed contents of the two caeca were separately prepared to 10–7 in phosphate-buffered saline and 100 μL of each dilution plated to modified charcoal deoxycholate (mCCDA) agar supplemented with cefoperazone (32 mg/L) and amphotericin B (10 mg/L; Oxoid), followed by incubation for 48 h under microaerophilic conditions (5% O2, 5% CO2, and 90% N2) at 41 °C. Dilutions were plated in duplicate and colonies with morphology typical of Campylobacter were counted and expressed as CFU/g. The theoretical limit of detection by the method used was 100 CFU/g. In instances where no colonies were observed after direct plating, a Campylobacter load equal to the theoretical limit of detection was assumed, as enrichment to confirm the absence of Campylobacter in caecal contents was not performed.

All the birds were genotyped with a proprietary 50K high-density genome-wide SNP array and then imputed using AlphaImpute73,74 to the 600K SNP Affymetrix Axiom HD array75 based on parent, grand-parent and great-grand-parent 600K SNP Affymetrix data. Of 3000 birds sampled, genotypes for 2718 birds were successfully imputed. Imputation failures likely reflect a lack of compatibility between the pedigree information and the genotypic data.

### Heritability analysis

As caecal Campylobacter levels were not normally distributed, all counts were log-transformed and expressed as log10 CFU/g. Genetic parameters were estimated for caecal Campylobacter colonisation resistance using a mixed linear univariate model that included the date of sampling and the sex as fixed effects, and the random effect of the individual birds linked to each other with the pedigree genetic relationship matrix. Body weight and maternal effects were also tested but their effects on Campylobacter levels were not significant and therefore were not included in the final model. Genetic relationships between birds were calculated using a three generations pedigree and included in the analyses. The heritability of the trait was calculated as the ratio of the additive genetic variance to the total phenotypic variance. The analysis was performed using ASReml v4.076.

### Genome-wide association study

The 50K and 600K SNP genotype data were subjected to quality control measures using PLINK v1.0977 with parameters of minor allele frequency > 0.05, call rate > 95% and Hardy–Weinberg equilibrium (P > 10−6). After quality control, 37,498 and 288,381 SNP markers remained for further analysis (from the 50K and 600K datasets, respectively). Positions of SNP markers were obtained using the GalGal5 annotation, available via the Ensembl Genome Browser (http://www.ensembl.org). Population stratification was investigated using a genomic relatedness matrix generated from all individuals. This was converted to a distance matrix that was used to carry out classical multi-dimensional scaling analysis (MSA) using the R package GenABEL78 to obtain its principal components. The GEMMA v0.98.1 algorithm79 was used to perform GWAS analyses using a standard univariate linear mixed model in which date of sampling and sex were fitted as fixed effects and the genomic relatedness matrix among individuals was fitted as a polygenic random effect. After Bonferroni correction for multiple testing, significance thresholds for analysis with the 50K array were P ≤ 1.33 × 10−6 and P ≤ 2.66 × 10−5 for genome-wide significant levels (i.e., P ≤ 0.05) and suggestive significant levels (namely one false positive per genome scan), respectively, corresponding to − log10(P) of 5.87 and 4.47. The significance thresholds for the 600K array after Bonferroni correction were P ≤ 1.73 × 10−7 and P ≤ 3.46 × 10−6 corresponding to − log10(P) of 6.76 and 5.45. The extent of linkage disequilibrium (LD) between significant SNPs located on the same chromosome regions was calculated using the r-square statistic of PLINK v1.0977.

Individual markers found to be significant in the previous step were further examined with a mixed model that included the same fixed effects as used in the GWAS, the fixed effect of the corresponding SNP locus genotype and the random effect of the animal. Additive (a) and dominance (d) effects, and the proportion of total phenotypic variance (PVP) due to each SNP locus were calculated as follows:

$$a = \frac{{({\text{AA}} - {\text{BB}})}}{2}$$
$$d = AB - \frac{{({\text{AA}} + {\text{BB}})}}{2}$$
$$PV_P = \frac{(2pq (a + d (q - p)) 2)}{{{\text{VP}}}}$$

where AA, BB and AB were the predicted trait values of the respective genotypic classes, p and q were the allelic frequencies of A and B at the SNP locus, and VP was the total phenotypic variance of the trait. The latter were estimated with the same model used for the heritability estimate. All analyses were run with ASReml v4.076.

### Regional heritability mapping

RHM analyses were performed using DISSECT80 fitting genomic regions of 20 SNPs in sliding windows along each chromosome with the same fixed effects as the ones used in the single SNP GWAS described above. The significance of genomic regions was assessed with the likelihood ratio test statistic, which was used to compare the RHM model where both the whole genome and a genomic region were fitted as random effects against a base model that excluded the latter effect. Only the 50K data was analysed with this method. A total of 1915 regions were tested across the genome. After adjustment for multiple testing, using the Bonferroni correction, significance thresholds were P ≤ 2.63 × 10−5 and P ≤ 5.26 × 10–5 for genome-wide levels (P ≤ 0.05) and suggestive levels (namely one false positive per genome scan), respectively, corresponding to − log10(P) of 4.57 and 3.27.

### SNP and candidate region annotation

All significant SNPs identified in the GWAS and RHM analyses were mapped to the GalGal5 reference genome and annotated using the Ensembl Variant Effect Predictor (http://www.ensembl.org/Tools/VEP). Moreover, all the genes that were located within the 20 SNP windows found to be significant by RHM; and the 250 kb 5′ and 3′ regions of the significant SNP markers identified by the GWAS were annotated using GalGal5 data obtained by the BioMart data mining tool (http://www.ensembl.org/biomart/martview/). This allowed us to catalogue all the genes that were located in the vicinity of the identified significant SNP markers for Campylobacter colonisation resistance.

### Transcriptomic analyses

Total RNA was prepared from the caecal tonsils of 23 broilers, selected on their genotype (allele combination in the significant identified markers) and caecal Campylobacter load, after correction for other sources of systematic variation (sex and date of sampling). Details of the birds selected are shown in Supplementary Table S4. RNA was extracted using the RNeasy Mini Kit (Qiagen Hilden, Germany) according to manufacturer’s instructions. The resultant RNA was checked for quality using the Agilent Tapestation 2200, and all samples were of high quality with RNA Integrity Numbers (RIN) greater than 9. Library preparation was performed by Edinburgh Genomics (http://genomics.ed.ac.uk/) using the Illumina TruSeq mRNA (poly-A selected) library preparation protocol (Ilumina; Part: 15031047, Revision E). The mRNA was sequenced by Edinburgh Genomics at a depth of > 40 million strand-specific 75 bp paired-end reads per sample, using an Illumina HiSeq 4000. Expression levels for the 23 samples were estimated using Kallisto v0.43.081. Rather than aligning RNA-seq reads to a reference genome, reconstructing transcripts from these alignments and then quantifying expression as a function of the reads aligned, Kallisto employs a ‘lightweight’ algorithm, which first builds an index of k-mers from a known transcriptome. As a reference transcriptome, we obtained from Ensembl v89 the set of GalGal5 cDNAs and ncRNA transcripts (ftp://ftp.ensembl.org/pub/release-89/fasta/gallus_gallus/cds/Gallus_gallus.Gallus_gallus-5.0.cds.all.fa.gz, and ftp://ftp.ensembl.org/pub/release-87/fasta/gallus_gallus/ncrna/Gallus_gallus.Gallus_gallus-5.0.ncrna.fa.gz; n = 38,118 total transcripts, representing 10,846 protein-coding genes and 937 non-coding genes). Expression levels were then estimated directly (i.e., in an alignment-free manner) by quantifying exact matches between reads and k-mers. Expression is reported per transcript as the number of transcripts per million, and is summarised to the gene level as described previously82.

#### Differential expression

Differential expression analysis was run on caecal tonsils from birds with high, intermediate and low Campylobacter load, after adjusting for sex and seasonality, using the Kallisto output with the R/Bioconductor package ‘Sleuth’ v0.29.083. Differential expression was considered significant for FDR corrected P values ≤ 0.05 and suggestive significant for uncorrected P values ≤ 0.001. Additional differential expression analyses were performed using the qRT-PCR output for a subset of the genes found to be significantly differentially expressed in the initial RNA-Seq analysis. Least square mean pairwise comparisons between different Campylobacter levels were conducted. Tukey’s HSD post-hoc test adjustment was applied at a significance level of 0.05.

#### Expression QTL analysis

eQTL analyses were performed using the R package Matrix eQTL v2.1.084. The number of transcripts per million, derived from Kallisto analysis as described above, were used as a measure of gene expression. Several covariates (log10 transformed Campylobacter counts, sex, date of sampling) were included in the association analysis. Cis- and trans-eQTLs were obtained, considering cis-acting SNPs to be within 100 kb from the 5′ start or 3′ end of a known gene. P values were corrected using false discovery rate (FDR) estimated with Benjamini–Hochberg procedure. SNP–gene expression association was considered significant for FDR corrected P values ≤ 0.05. For the significant cis-eQTL we estimated effect size using the log allelic fold-change (aFC) measurement. aFC is defined as the log-ratio between the expression of the haplotype carrying the alternative variant allele to the one carrying the reference allele and was calculated as described85. Briefly, the model assumes an additive model of expression in which the total expression of a gene in a given genotype group is the sum of the expression of the two haplotypes: e(genotype) = 2e_r, e_r + e_a, and 2e_a for reference homozygotes, heterozygotes, and alternate homozygotes, respectively, where e_r is the expression of the haplotype carrying the reference allele, and e_a the expression of the haplotype carrying the alternative allele. The allelic fold change k is defined as e_a = k e_r where 0 < k < ∞. aFC is represented on a log2 scale as s = log2 k.

#### SNP calling and allele-specific expression analysis

Allelic-specific expression was assessed using the R package AllelicImbalance v1.24.090. For every SNP in a region of interest, read counts were obtained for each allele present in a heterozygous animal, provided it was present in > 4 and < 17 heterozygous animals (i.e. 75% of the total animals). SNPs with less than 10 reads were excluded. A binomial test was performed to assess the significance of the difference in allelic count. Allele-specific expression was considered significant if the mean P value across all heterozygotes was ≤ 0.05.

### Quantitative RT-PCR validation of differentially expressed genes

First strand synthesis was performed using 1 μg of total RNA and the Verso cDNA Synthesis Kit (Thermo Scientific) according to the manufacturer’s instructions. qPCR reactions were performed using a hot-start EvaGreen dye-based master mix (the Forget-Me-Not qPCR Master Mix (Biotium), https://biotium.com/technology/pcr-dna-amplification/forget-qpcr-master-mix/) in 20 μL volumes containing 1 × Forget-Me-Not qPCR Master Mix, 0.5 μM of each forward and reverse primer, 50 nM of ROX reference dye and 2 μL of cDNA at a 1:10 dilution in template buffer. Gene-specific primers were designed and purchased from Sigma. Primer sequences are detailed in Supplementary Table S9. The amplification and detection of specific DNA was achieved using the AB 7500 FAST Real-Time PCR System (Applied Biosystems) and the following programme: 95 °C for 2 min followed by 40 cycles of 95 °C for 5 s then 60 °C for 30 s. To confirm the presence of a single PCR product, melting curves were generated by one cycle of 60 °C for 1 min, increasing to 95 °C in 1% increments every 15 s. Samples were run in triplicate and each qPCR experiment contained 3 no-template control wells and a fivefold dilution series in duplicate of pooled caecal tonsil derived cDNA from several birds from which standard curves were generated. The expression of genes were normalised to the geometric mean of three reference genes found previously to be stably expressed in chicken lymphoid organs; r28S, TBP and GAPDH91.

### Pathway, network and functional enrichment analyses

Identification of potential canonical pathways and networks underlying the candidate genomic regions associated with Campylobacter colonisation resistance was performed using the Ingenuity Pathway Analysis (IPA) programme (https://digitalinsights.qiagen.com/products-overview/discovery-insights-portfolio/analysis-and-visualization/qiagen-ipa/). IPA constructs multiple possible upstream regulators, pathways and networks that serve as hypotheses for the biological mechanism underlying the phenotypes based on a large-scale causal network derived from the Ingenuity Knowledge Base. IPA then infers the most suitable pathways and networks based on their statistical significance, after correcting for a baseline threshold92. The IPA score in the constructed networks can be used to rank these networks based on the P values obtained using Fisher’s exact test [IPA score or P score = – log10(P value)].

The gene list for Campylobacter colonisation resistance was also analysed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.893. Gene ontology (GO) was determined and functional annotation clustering analysis was performed. The Gallus gallus background information is available in DAVID and was used for the analysis. The enrichment score (ES) of the DAVID package is a modified Fisher exact P value calculated by the software, with higher ES reflecting more enriched clusters. An ES greater than 1 means that the functional category is overrepresented.

### Ethical approval

As these were commercial birds from an industry breeding programme and were not experimentally inoculated, the study was conducted outside the auspices of the Animals (Scientific Procedures) Act 1986, but was nevertheless subject to scrutiny and approval by the Animal Welfare & Ethical Review Body of The Roslin Institute, University of Edinburgh.