Human leukocyte antigen alleles associate with COVID-19 vaccine immunogenicity and risk of breakthrough infection

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vaccine immunogenicity varies between individuals, and immune responses correlate with vaccine efficacy. Using data from 1,076 participants enrolled in ChAdOx1 nCov-19 vaccine efficacy trials in the United Kingdom, we found that inter-individual variation in normalized antibody responses against SARS-CoV-2 spike and its receptor-binding domain (RBD) at 28 days after first vaccination shows genome-wide significant association with major histocompatibility complex (MHC) class II alleles. The most statistically significant association with higher levels of anti-RBD antibody was HLA-DQB1*06 (P = 3.2 × 10−9), which we replicated in 1,677 additional vaccinees. Individuals carrying HLA-DQB1*06 alleles were less likely to experience PCR-confirmed breakthrough infection during the ancestral SARS-CoV-2 virus and subsequent Alpha variant waves compared to non-carriers (hazard ratio = 0.63, 0.42–0.93, P = 0.02). We identified a distinct spike-derived peptide that is predicted to bind differentially to HLA-DQB1*06 compared to other similar alleles, and we found evidence of increased spike-specific memory B cell responses in HLA-DQB1*06 carriers at 84 days after first vaccination. Our results demonstrate association of HLA type with Coronavirus Disease 2019 (COVID-19) vaccine antibody response and risk of breakthrough infection, with implications for future vaccine design and implementation.

Article https://doi.org/10.1038/s41591-022-02078-6 failure has not been robustly demonstrated 23,24 . To date, genetic studies in COVID- 19 have focused on risk of severe disease, with replicated associations implicating antiviral defense mechanisms (notably involving interferon signaling), mediators of inflammatory organ damage, leucocyte differentiation and blood type antigen secretor status but limited evidence to date for HLA [25][26][27] . In this study, we used data from five clinical trials of ChAdOx1 nCoV-19 to demonstrate association of HLA-DQB1*06 with higher antibody responses against the RBD of spike antigen and lower risk of breakthrough infections, which we propose involves altered HLA peptide binding influencing memory B cell responses.

Genome-wide association study of antibody responses 28 days after ChAdOx1 nCoV-19 vaccination
We hypothesized that genetic factors contribute to inter-individual variation in COVID-19 vaccine responses. To investigate this, we first performed a discovery analysis testing for genetic association with vaccine responses in participants enrolled in the phase 1/2 (COV001) and phase 2/3 (COV002) randomized single-blind ChAdOx1 nCov-19 (AZD1222) vaccine efficacy trials, conducted within the UK, and in whom humoral immune responses were measured after vaccination. Figure 1 summarizes participant inclusion. DNA from 1,222 ChAdOx1 nCov-19 trial participants was genotyped on the Affymetrix AxiomTM HGCoV2 1 array. After quality control (Methods), 667,496 variants in 1,190 individuals were available for single-nucleotide polymorphism (SNP) variant imputation ( Supplementary Fig. 1). After imputation, 9,325,058 high-quality SNPs were tested for association with normalized antibody responses against spike and RBD (Extended Data Fig. 1) in 1,076 of the 1,190 genotyped individuals who had received ChAdOx1 nCoV-19 vaccine, with antibody measures available at 28 days after first vaccination (baseline demographics are shown in Table 1). We performed the association analysis adjusting for age, sex, prior Oxford-AstraZeneca) 4 , are estimated to have population effectiveness against a positive polymerase chain reaction (PCR) test for the earliest variants of SARS-CoV-2 of 79% and 80%, respectively, when assessed at least 21 days after the second dose of vaccination in a community-based household survey from the United Kingdom (UK) (1 December 2020 to 8 May 2021) 5 , together with 88-91% effectiveness against hospital admission for COVID-19 (ref. 5 ), although lower effectiveness is reported with more recent variants of concern 6 . Despite the success of vaccines at reducing mortality and morbidity in the population, with effectiveness against severe disease and hospitalization currently remaining high, vaccine breakthrough infections, although predominantly mild, are increasingly reported [7][8][9] .
Considerable variation in immune responses, including antibody levels and T cell responses, has been reported among vaccinated individuals 10 . Neutralizing antibody levels show association with vaccine efficacy in animal challenge studies 11 and humans 12,13 , and risk of symptomatic COVID-19 has been shown to reduce with increasing levels of both anti-spike (anti-S IgG) and antibodies against RBD antigenic sites on the viral spike (anti-RBD IgG) after vaccination with ChAdOx1 nCoV-19 (ref. 13 ). The reasons for inter-individual variation in total or neutralizing antibody responses are incompletely understood 10,14 .
Community-based surveys have provided some epidemiological insight into this question among individuals with no prior history of SARS-CoV-2 infection in the UK general population: a low anti-S IgG antibody responder group after vaccination was identified and found to be more commonly male, elderly (over 75 years of age) and with long-term health conditions 10 .
We sought to investigate the contribution of genetic factors to the observed variation in response to vaccination with ChAdOx1 nCoV-19. Antibody responses after vaccination show evidence of heritability 15 , with genetic variation in HLA within the MHC on chromosome 6 (position p21.3) associated with responses to hepatitis B [16][17][18][19] , tetanus 20 and measles 21,22 vaccines. For these infections, the relevance for vaccine   1 | Baseline characteristics for 1,076 ChAdOx1 nCoV-19 participants in the phase 1/2 (COV001) and phase 2/3 (COV002) trials who received ChAdOx1 nCoV-19 vaccine with genotype data available and for the 1,069 individuals who either experienced a breakthrough infection after 21 days or who did not experience a breakthrough up to time of censoring SARS-CoV-2 exposure based on anti-nucleocapsid (anti-N) IgG concentrations (n = 128, 11.9%) and antibody assay type (all as fixed-effect covariates) for all individuals irrespective of ancestry (Extended Data Fig. 2), including a genetic relatedness matrix (GRM) as a random effect covariate. The mixed model regression analysis revealed genome-wide significant associations (P < 5 × 10 −8 ) for both anti-spike (index variant rs9271374, P = 2.6 × 10 −8 , beta= −0.14 and s.e. = 0.03) and anti-RBD (rs1130456, P = 4.4 × 10 −10 , beta = −0.26 and s.e. = 0.04) IgG antibody levels. rs9271374 and rs1130456 are SNPs located within 10 kilobases (kb) of HLA-DQ genes (Fig. 2) and in linkage disequilibrium within our multi-ancestry cohort (r 2 = 0.65). The distribution of P values (Extended Data Fig. 3a) and beta coefficients (Extended Data Fig. 3b) for all genotyped and imputed variants across this locus show a clear correlation in genetic architecture between these two antibody responses (Spearman's rho coefficient 0.90 and 0.93 for P values and beta coefficients, respectively) correlated through linkage disequilibrium (measured through r 2 ). These genetic association signals may be falsely observed as a result of two important factors. First, although every effort was made to normalize the antibody levels while acknowledging the different platforms for antibody measurement, the final distributions still deviated from normality, which could increase the risk of detecting a genetic association signal by chance (Extended Data Fig. 1). Therefore, we performed a further round of inverse normal transformation on the pooled RBD-specific antibody distributions (Extended Data Fig. 1f) to create a merged, normalized distribution (Extended Data Fig. 4a) and re-ran the association analysis for anti-RBD antibodies. The rs1130456 association was still present and was the most significant association (P = 4.7 × 10 −9 ; Extended Data Fig. 4b-d). Second, given the diverse ancestry of individuals included, it is possible that the association could be a result of confounding due to population structure. The genomic inflation factor (λ) from our primary RBD association analysis was 1.023. When the extended MHC region was excluded, λ was 1.007, suggesting that much of the observed inflation was a result of the large number of variants associated with MHC (Extended Data Fig. 5a for Manhattan and Extended Data Fig. 5b for QQ). Furthermore, given the low levels of natural exposure to SARS-CoV-2 in our population at the time of sampling early in the pandemic, we used our data for anti-N IgG concentrations to again test for excessive inflation, using N as a negative control. No associations of genome-wide significance were observed, and λ was estimated at 1.017 (Extended Data Fig. 5c,d).
To further explore the effect of population structure, we re-ran our association analyses for RBD, including the first ten genetic principal components (PCs) (derived from the entire genotyped dataset) as additional fixed-effect covariates using our mixed-model approach (Extended Data Fig. 5e,f). λ using this approach was 0.985, suggesting a degree of overfitting, but, again, the same variant (rs1130456) remained most significantly associated with RBD-specific antibodies, albeit with a marginally attenuated P value (1.3 × 10 −9 ), as to be expected when including multiple additional covariates in the model.

HLA imputation and fine-mapping of associated variants
We proceeded to test for evidence of association with spike-specific and RBD-specific IgG antibodies at the level of HLA gene and protein variation. Imputation (Methods) identified 640 HLA alleles and 4,513 amino acid (AA) changes (of which 81 alleles and 3,027 AA substitutions were present in our dataset at a minor allele frequency of ≥0.05). To undertake fine-mapping, we identified 1,023 individuals with identity-by-descent (IBD) values of 0.185 or less, and those of self-reported, and principal component analysis (PCA)-derived European ancestry (using PC1 and PC2 cutoffs as shown in Extended Data Fig. 2, inset). Of all HLA and AA alleles tested for association with spike and RBD antibody levels, the HLA allele with the most significant association was HLA-DQB1*06 with anti-RBD antibodies (P = 3.2 × 10 −9 , beta = 0.27 and s.e. = 0.04; Supplementary Table 1). An AA variant had a P value identical to that of HLA-DQB1*06 (3.2 × 10 −9 ) but the exact inverse beta coefficient (−0.27). This AA variant (DQB1-125A/S) denotes the presence of either an alanine or a serine at position 125 of the HLA-DQB1 protein according to international ImMunoGeneTics (IMGT) project coordinates. HLA-DQB1*06 has a glycine at position 125, whereas other alleles common in our genotyped population possess either alanine (HLA-DQB1*02 and *04 alleles) or serine (HLA-DQB1*05). Thus, this AA variant is synonymous with  the presence of HLA-DQB1*06 in our dataset. The index-associated variant from the primary analysis (rs1130456) was equally associated with the anti-RBD titers in this analysis (beta = 0.27 and s.e. = 0.04).
Other variants imputed using the specific HLA imputation algorithm were identified as being marginally more significantly associated than rs1130456, with the new lead being rs9273817 (P = 2.4 × 10 −9 , beta = 0.27 and s.e. = 0.04).
To further understand the relationship between the top associated variants, we performed stepwise forward regression analysis incorporating the full set of SNP, AA and HLA allele variants from HLA imputation in the set of individuals restricted by IBD, PCs and self-reported ethnicity (Fig. 3a,b). Adjusting for the new top SNP (rs9273817) in the first round of conditional analysis, there was a complete abolition of the signals for both rs1130456 (P = 0.65) and HLA-DQB1*06 (P = 0.65), supporting the likelihood that these variants are all tagging the causal variant, most plausibly HLA-DQB1*06 (Fig. 3a,b, middle rows). A second, likely independent, signal of association was observed in HLA-DRB1, with the index variant being the presence of a glutamate or an arginine at position 71 (according to IMGT) of HLA-DRB1 (DRB1-71E/R, P conditional = 2.7 × 10 −4 , beta = −0.14 and s.e. = 0.04; Fig. 3a, middle row). After conditioning on this variant, no further independent signals with a P < 1 × 10 −2 were observed in the class II MHC region. Before proceeding with further fine-mapping, we confirmed that both the HLA-DQB1*06 and DRB1-71E/R associations were robust to statistical inflation by performing a Monte Carlo exact test with 10 8 permutations. The likelihood of both associations occurring by chance was still less than the number of permutations (that is, P < 2 × 10 −8 ), limited only by computational time for testing. Furthermore, we tested for evidence of both the HLA-DQB1*06 and DRB1-71E/R associations in the different population strata (including 928 European versus 148 non-European) individuals and observed evidence of the same trend of association in both groups, supporting further that this association is unlikely to be spurious due to population structure ( Supplementary Fig. 2).
To confirm that the HLA-DQB1*06 allele was the most likely primary gene locus associated with the antibody response, we used Bayesian information criterion (BIC) modeling with phased data. We found that variation in RBD antibody levels was best described using HLA-DQ alleles (HLA-DQA1*01/HLA-DQB1*06, BIC 2,715.2) rather than HLA-DRB1 (HLA-DRB1*15, BIC 2,719.1), supporting that the primary association was likely linked to HLA-DQ, rather than HLA-DR, variation ( Supplementary  Fig. 3). We next investigated whether there was evidence of interaction between the HLA-DQB1*06 and independent DRB1-71E/R associations. Using a likelihood ratio test (LRT) comparing the linear and interaction terms, we found no evidence of a complex inter-dependence between these two variants (P = 0.44; Supplementary Fig. 4). We, therefore, compared models describing variation in a simple linear additive model (that is, normalized anti-RBD antibody levels ~ HLA-DQB1*06 + DRB1-71E/R) compared to a model where we compared individuals grouped into the presence of one variant in the absence of the other (Fig. 3c), and we found that the latter was more parsimonious after adjusting for age, sex, five PCs and anti-RBD antibody measurement assay (BIC 2,965.42 versus 2,689.65, respectively). Thus, using this combined description of variation, we next tested for association of HLA-DQB1*06 with increased anti-RBD antibody levels accounting for DRB1-71E/R over the time course of the ChAdOx1 nCoV-19 trial. We found significant differences between the opposing DQB1*06 and DRB1-71E/R carrier groups seen at day of second dose (P = 2.7 × 10 −7 using the Student's t-test), at day 28 after second dose (P = 2.6 × 10 −7 ) and at day 90 after second dose (P = 0.01) (Fig. 4a). No significant difference was observed at day 182 after second dose. A summary of the baseline demographics of individuals stratified by HLA allele group is provided in Supplementary Table 5.

Replication of genetic association signals with COVID-19 vaccination antibody responses
To provide further evidence that the observed genetic associations were robust and not restricted to the COV001 and COV002 cohorts, we aimed to test for replication of the associations in a series of independently recruited cohorts (Extended Data Fig. 6). Three trials were coordinated after COV001 and COV002 to address questions regarding safety and immunogenicity of heterologous dosing of COVID-19 vaccines (COMCOV and COMCOV2) and to assess immunogenicity and safety in children (COV006). DNA from 1,722 individuals (638 COM-COV, 876 COMCOV2 and 208 COV006) were genotyped on the same Affymetrix AxiomTM HGCoV2 1 array with identical quality control and imputation pipelines applied as to the COV001 and COV002 cohorts. After quality control, data from 1,677 individuals (627 COMCOV, 847 COMCOV2 and 203 COV006) were available (Supplementary Table 6). These replication cohorts differed considerably from the ChAdOx1 nCoV-19 trials with respect to age, sex proportions and timing and nature of vaccination regimens, and only data for spike antibody levels (and not RBD) were available. Moreover, COMCOV was enriched for non-White reported ethnicities. Nevertheless, when using the same HLA-DQB1*06 and DRB1-71E/R classifications, we observed statistical evidence of association with anti-spike antibody levels measured at the time of second vaccination in the same direction as observed for COV001 and COV002 when looking at those individuals receiving both ChAdOx1 nCoV-19 or BNT162b2 as their first vaccine. Both unadjusted and adjusted models were used to compare the groups as follows.
When including all individuals irrespective of the first vaccine received (all individuals genotyped from COMCOV, COMCOV2 and COV006 received either ChAdOx1 nCoV-19 or BNT162b2 as their first vaccine), there was a significant difference in anti-spike antibody levels measured on the day of second vaccine (median of 67 days with a range of 28-184 days after the first vaccine) when comparing the group 'Carrying DRB1-71E/R with no DQB1*06' against both those 'Carrying DQB1*06 with no DRB1-71E/R' (P t = 5.4 × 10 −3 ; P value derived from the t-test) and the 'Remainder' of individuals (P t = 0.02; Supplementary  Fig. 5a). Significant differences in anti-spike antibody levels were also observed for those individuals given a first dose of BNT162b2 comparing the groups 'Carrying DRB1-71E/R with no DQB1*06' and those 'Carrying DQB1*06 with no DRB1-71E/R' (P t = 0.01) and the 'Remainder' of individuals (P t = 0.04; Supplementary Fig. 5b) and for ChAdOx1 nCoV-19 with significance observed only for the comparison between 'Carrying DRB1-71E/R with no DQB1*06' and 'Carrying DQB1*06 with no DRB1-71E/R' groups (P t = 0.04; Supplementary Fig. 5c). Using a linear model adjusting for age, sex, interval between vaccination and sampling, first vaccine, study and self-reported ethnicity, there was a significant difference in spike antibody levels observed for all individuals when testing the effect of HLA-DQB1*06 accounting for DRB1-71E/R (P LRT = 5.2 × 10 −3 using the LRT to compare with the null) and for those    individuals primed with BNT162b2 (P LRT = 0.02), with a trend toward significance for individuals primed with ChAdOx1 nCoV-19 (P LRT = 0.1). Similarly, we observed a significant effect in the same direction when performing the same analysis comparing individuals grouped by either HLA-DQB1*06 carriage alone (P LRT = 0.04) or DRB1-71E/R carriage alone (P LRT = 7.8 × 10 −4 ). All together, we found further evidence that carriage of HLA-DQB1*06 and the presence of either a glutamate or an arginine at position 71 in the HLA-DRB1 protein is associated with differential antibody responses to spike across individuals, irrespective of the nature of first vaccine and across ages and both self-reported White and non-White ethnicities. Given the correlation in antibody levels against spike and RBD in our discovery set, it is likely that our observed genetic signal would also be observed for RBD, even though RBD was not measured directly in the replication cohorts.

Testing for association of genetic variants on immune response over time and risk of breakthrough infection
Given the observed association between HLA variants with variable immunogenicity, we next  (Table 1 and Supplementary Table 7). We found that HLA-DQB1*06 was present in 33.9% of individuals experiencing breakthrough infection compared to 45.6% of individuals who did not have breakthrough infection (chi-squared P = 0.02). We subsequently found that individuals carrying HLA-DQB1*06 alleles were less likely to experience breakthrough infection over time compared to individuals who did not carry HLA-DQB1*06 after adjustment for age, sex, the first two genetic PCs (representing ethnicity), healthcare worker status, body mass index (BMI) and chronic disease status (adjusted hazard ratio (HR) = 0.63, 0.42-0.93, P = 0.02; Fig. 4b and Extended Data Fig. 7). We performed sample reweighting for dose and interval between first and second dose of vaccination (using inverse probability weighting) to ensure that our analyses were as consistent with prior correlates analyses as possible 13 . This significance persisted even after adjusting for whether individuals were likely to have been naturally exposed to SARS-CoV2 (determined using N measurements) and based on whether they were related to each other (IBD < 0.185) or not (P = 0.02). A similar effect was observed when describing individuals using our overall HLA status definition (that is, carrying HLA-DQB1*06 alleles accounting for DRB1-71E/R), although significance was lost (adjusted HR for the group 'Carrying DRB1*06 with no DRB1-71E/R' was 0.54, 0.26-1.1, P = 0.09; Fig. 4c and Extended Data Fig. 8 Fig. 9).

Structural insights into HLA spike peptide binding
Given the observed immunological and clinical impact of HLA-DQB1*06 on vaccine response and effectiveness, we next tested for structural evidence of binding of spike peptides by the associated HLA-DQB1*06 allele. We tested the hypothesis that HLA-DQB1*06:02 could bind peptides from SARS-CoV-2 spike more effectively than an alternative HLA-DQB1 allele that was both common in the population and linked with another HLA-DQA1*01 allele. Using the COV001/ COV002 data, we identified HLA-DQB1*05:01 as an allele that would act as a suitable comparator (frequency in COV001/COV002 = 12%, beta for association with 28-day RBD levels = −0.14, s.e. = 0.05 and P = 0.01, acknowledged to commonly pair with HLA-DQA1*01:01 (ref. 28 )). Other HLA alleles that were common in our merged COV001/ COV002 dataset were less suitable for this analysis. DQB1*03, for example, (frequency, 34%) pairs more commonly with DQA1*03 or *05 alleles, whereas DQB1*02 (23%) pairs with DQA1*02 or DQA1*05. The only available crystal structure for DQB1*06 alleles is the HLA-DQA1*01:02-HLA-DQB1*06:02 in complex with a hypocretin peptide (LPSTKVSWAAV) 29 . The key sidechains of Ser (position (P) 3), Thr (P4) and Val (P6) of the peptide are buried in the center of the groove formed by two HLA molecules (Extended Data Fig. 10a). Thus, we searched for a hypocretin-like peptide motif (Ser/Cys) ThrXVal in spike protein (where X is any amino acid with its sidechain pointing away from the groove; Ser and Cys differ in one atom only). Spike residues Val615-NCTEVPVAI-His625 could be aligned with a hypocretin peptide and, thus, enabled us to model a complex of HLA-DQA1*01:02-HLA-DQB1*06:02 bound to the spike peptide using AlphaFold 30 (Fig. 5a). Both the AlphaFold-based model and the crystal structure support DQB1*06:02 interacting differently with any peptide compared to DQB1*05:01. DQB1*05:01 differs from DQB1*06:02 by at least three key residues forming hydrogen bonds with the bound hypocretin peptide (Extended Data Fig. 10b-d), making analogous DQB1*05:01peptide interactions impossible. Our analysis identifies specific residues of DQB1*05:01 and DQB1*06:02 responsible for different peptide recognition and subsequent recognition by T cell receptors.

Immunological implications of the observed MHC associations
To further support these observations, we used peripheral blood mononuclear cells (PBMCs) available from a small number of participants from COV001 and COV002 to compare anti-spike-specific memory B cell responses at day 84 after the first vaccine in ten individuals carrying HLA-DQB1*06 and ten individuals not carrying HLA-DQB1*06. We observed an increase in anti-spike memory B cell responses in the individuals carrying HLA-DQB1*06 (P = 0.05 using a one-tailed Wilcoxon rank test) at day 84 that was not apparent at day 0 (Fig. 5b). We then searched for similar signals of association in the intermediate components of the MHC-T-B-antibody axis. We observed a difference in overall CD4 proliferation in response to stimulation with S1 (that includes the RBD domain and the putative Val615-NCTEVPVAI-His625 peptide, P = 0.01; Fig. 5c) but not against S2 (cleaved away before residue 686). We did not observe an equivalent signal with antigen-specific T cell activation (using the activation-induced marker (AIM) assay; Fig. 5d).

Discussion
Our findings show that individuals carrying HLA-DQB1*06 alleles have higher antibody responses against SARS-CoV-2 spike protein and the RBD after vaccination with both ChAdOx1 nCoV-19 and BNT162b2 vaccines than individuals not carrying this allele. HLA-DQB1*06 is also associated with a reduced risk of breakthrough infection based on PCR positivity after a median 494 days of follow-up after receiving their first dose of vaccine. To our knowledge, this is the first report of an HLA association with antibody responses after immunization with SARS-COV-2 vaccines and of a genetic association with risk of SARS-CoV-2 breakthrough infection [15][16][17][18][22][23][24]31 . We further provide a working mechanistic hypothesis for the primary HLA-DQB1*06 association of potentially distinct peptide binding that may lead to improved CD4 + T cell proliferation and memory B cell activation. Our study design comprised an infection-naive and vaccine-naive population in a clinical trial setting with appropriate blinding, detailed immune phenotyping and patient follow-up at defined timepoints, and we further substantiated our findings in a large replication dataset with preliminary follow-up functional experiments. The global evidence of breakthrough infections after vaccination, of changes in immune correlates of protection over time and of new SARS-CoV-2 variants highlight the importance of subsequent dosing of vaccination and understanding how this can be optimally deployed 9,13,32 . Our study demonstrates that there is a heritable component to observed inter-individual variation in antibody response at day 28 after first dose but also throughout time after vaccination, across vaccinee age and type of first vaccine. Given that these effects do persist over time, with some change in effect size, and have clinical relevance in terms of risk of breakthrough infection, the observed HLA associations raise the potential utility of prioritizing at-risk populations based, for example, on HLA-DQB1*06 allele frequencies, among whom more intensive booster vaccination may be warranted. Variable HLA-DQB1*06 allele frequencies are reported across diverse populations 33-36 but there is not yet robust epidemiological evidence of the extent of breakthrough infections in such populations, and this would require further investigation before implementation of such an intervention. The observed reduction in effect size on log 10 -transformed RBD-specific antibody levels in the ChAdOx1 nCoV-19 vaccinees over time from 0.38 at day 28, to 0.32 at boost, to 0.17 at day 28 after boost and to 0.12 at day 90 may represent a true reduction in effect of genetic variation over time or could also be a result of the limit of detection and dynamic range of the antibody assay. Further re-analyses using recalibrated assay detection systems would be necessary to resolve this issue.
Although we have provided preliminary evidence for our mechanistic hypothesis for HLA-DQB1*06, further studies to understand the structure-function relationships based on the specific allele/peptide predictions and T cell biology will be required. Previous genetic studies for non-SARS-CoV-2 vaccines implicating the MHC suggest both HLA peptide binding 23,37 and non-HLA effects relating, for example, to differential gene expression or complement activation 24 , as potential underlying mechanisms for observed genetic associations. For SARS-CoV-2 vaccine response, consideration of HLA genotype has been advocated in vaccine design based on predicted antigens presented to T cells across different ethnic groups to maximize efficacy based on T cell immunity, with potential utility as a booster agent to strengthen immune responses 38 . Limitations of the study include the need for further replication of the genetic association in other studies and populations and the representativeness of the trial population to the wider UK and global population. We propose that there is an urgent need to investigate these associations further in diverse ethnic groups and individuals of varying comorbidity to maximize insights and potential utility of the observed associations. There is also a requirement for mechanistic studies to further understand the functional basis of the association and the relationship with specific SARS-CoV-2 variants. A further limitation is the extent to which fine-mapping the association to specific variants and modulated genes was possible, reflecting the high level of sequence and structural polymorphism, sequence homologies and complex linkage disequilibrium in the MHC 39 . Only two antibody responses were analyzed, with a greater antibody repertoire and T cell immune response assays and other aspects of cell-mediated immunity important to include in future studies.
We propose that, to inform vaccine design and implementation against COVID-19 and other vaccine-preventable diseases with products either established or in development, an understanding of the impact of human genetics should be prioritized to deliver translational outputs for the long-term benefit of populations worldwide.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41591-022-02078-6.

Study design and participants
Discovery cohort. The participants were enrolled in phase 1/2 (COV001) or phase 2/3 (COV002) randomized single-blind ChAdOx1 nCoV-19 (AZD1222) vaccine multi-center efficacy trials conducted across multiple sites within the UK 4,40 (NCT04324606 and NCT04400838). In brief, after written informed consent, adults aged 18 years and older were randomly assigned to receive either intramuscular ChAdOx1 nCoV-19 (AZD1222) or a control vaccine (MenACWY) to assess the safety and efficacy of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2 (refs. 4,40 ). All individuals included in the analyses agreed to being included in genetic studies as part of the vaccine trial consent, with the opportunity to opt out. The trials were conducted according to the principles of Good Clinical Practice and approved by the South Central Berkshire Research Ethics Committee (20/SC/0145 and 20/SC/0179) and the UK regulatory agency (the Medicines and Healthcare products Regulatory Agency). Individuals with humoral immune responses measured after vaccination were selected for this genotyping study.
To maintain blinding of investigators to vaccine allocation-before vaccine trial reporting-participants who received the control vaccine were also genotyped, at a ratio of 1:10.

Breakthrough infections.
Participants received weekly reminders to report any primary symptoms of COVID-19 (cough, shortness of breath, fever, anosmia or ageusia), and, if symptomatic, a SARS-CoV-2 NAAT was performed on a nose and throat swab. Participants were also asked to return a weekly nose and throat swab for NAAT for the duration of the study. A breakthrough infection was defined as SARS-CoV-2 NAAT-positive swab at least 22 days after a first dose of vaccine. If participants were NAAT-positive but had symptoms other than the five primary COVID-19 described above, they were categorized as non-primary symptomatic cases but still included in the final breakthrough analyses reported herein.
Replication cohort. The replication cohort comprised participants from three COVID-19 vaccine trials conducted across several sites within the UK [41][42][43] . Two of these trials (COMCOV and COMCOV2) were in adults aged 50 years and older, randomized to receive homologous or heterologous two-dose schedules of intramuscular ChAdOx1 nCoV-19, mRNA vaccines (BNT162b2 or mRNA-1273) or a nanoparticle vaccine (NVX-CoV2373) 42,43 . The other trial (COV006) was in children aged 6-17 years who were randomized to receive either intramuscular ChAdOx1 nCoV-19 or a control vaccine (capsular group B meningococcal vaccine, 4CMenB) 41 . All participants included in this manuscript consented (or their parents/guardians) to being included in genetic studies as part of the vaccine trial consent, with the opportunity to opt out in the consent form.

Breakthrough infections.
Adult participants in the replication cohort self-reported breakthrough COVID-19 (no active COVID-19 testing as part of study) and associated symptoms. Parents/guardians reported PCR-confirmed or lateral flow assay-confirmed SARS-CoV-2 infections for the childhood participants. We defined a breakthrough infection for this cohort as a self-reported case of COVID-19 at least 22 days after a first dose of vaccine.

Antibody concentrations
Discovery cohort. Blood samples for serological testing were taken at baseline, day 28 after the first dose of vaccine, before the second vaccine and then at days 28, 90 and 182 after the second dose of vaccine. Day 28 post-first-vaccine responses were available on all participants, and the variance of response across the cohort was substantial, and together this influenced the choice for the discovery genome-wide association study (GWAS). Humoral immune responses were assessed using Meso Scale Discovery (MSD) multiplexed immunoassay against SARS-CoV-2 spike and RBD as well as the N proteins-in the entire phase 1/2 UK cohort (n = 585) and 15% of the phase 2/3 UK cohort (n = 637). Sample selection from the phase 2/3 trial was based on samples processed for the initial application for emergency use, which required 15% of samples included in the efficacy trial to be processed on validated assays. In addition, serological testing was also performed on samples from NAAT-positive individuals-missing data were assumed to be missing at random. Anti-spike, RBD and N IgG levels were measured by a multiplex immunoassay using the MSD platforms at two laboratories; the phase 1/2 samples were performed at MSD, and the phase 2/3 samples were performed at PPD Laboratories. Ninety-seven samples were assessed by both laboratories to evaluate inter-assay agreement.
Replication cohort. The replication cohort had blood samples for serological testing taken at baseline participation in the study, which was 28-84 days after the first dose of vaccine and on the day of receiving the second dose of vaccine. Adult participants in the replication cohort had their SARS-CoV-2 anti-spike IgG levels measured by enzyme-linked immunosorbent assay (ELISA) at Nexelis (Laval, Canada). In the childhood replication study, anti-spike IgG levels were measured using MSD at PPD Laboratories before receipt of any further vaccine dose.

DNA extraction
DNA was extracted from either blood clots remaining after serum separation by centrifugation or whole blood collected in EDTA tubes. In brief, clots were dispersed by centrifugation through clot spin baskets (Qiagen, 158932). ATL buffer (Qiagen, 1014758) was added to the centrifuged clot and vortexed. Proteinase K (Qiagen, 19131) was added, vortexed thoroughly and incubated in a shaking incubator at 56 °C until the clot was completely lysed (overnight). Afterr lysis, AL buffer was added (Qiagen, 1038826) and vortexed thoroughly. Lysate or whole blood was then transferred to the QIAsymphony 2.0 and extracted using the QIAsymphony DSP DNA Midi Kit (Qiagen, 937255).

Genotyping
Standardized DNA was sent to the National Institute for Health Research UK BioCentre and genotyped using their established pipelines on the Affymetrix AxiomTM HGCoV2 1 array. Raw CEL files were transferred back to Oxford for file conversion into build 38 using the Axiom Analysis Suite Best Practice Workflow. Individual samples and SNP variants were exported and underwent further quality control using PLINK 44 (version 1.9).
Individuals were excluded if more than 3% of SNPs were classed as missing or if derived genetic sex did not match reported sex or if levels of heterozygosity lay more than three times the standard deviation from the mean of individuals stratified by self-reported ethnicity. IBD was calculated for all pairs of individuals, and individuals were excluded if they had estimates ≥0.9, excluding the individual with the highest SNP missingness rate from each pair preferentially. Hardy-Weinberg estimates were calculated for each SNP within individuals classified as founders (with IBD < 0.05), and SNPs were excluded if exhibiting extreme deviations from equilibrium (using a threshold of P < 1 × 10 −50 calculated in PLINK 1.9).
PCs were calculated for all individuals both within the genotyped COVID-19 vaccine cohorts and merged with individuals from the 1000 Genomes Project derived from Great Britain (GBR), Han individuals from China (CHS) and African individuals from Kenya (LWK) and Nigeria (YRI). Before merging vaccinee data with 1000 Genomes individuals, SNPs were first lifted over from build 38 to build 37 coordinates using LiftOver (https://genome.sph.umich.edu/wiki/LiftOver). Plots were inspected to detect samples with extreme outlier values (>3 standard deviations from any expected cluster). A European cluster was defined by including only individuals falling within a defined cluster with GBR individuals and self-reporting as White ethnicity. Quality-controlled Article https://doi.org/10.1038/s41591-022-02078-6 SNPs and individuals were taken forward for genotype imputation, which was undertaken on the Michigan Imputation Server using the TopMed reference panel using recommended settings 45 . Files were converted to MACH format using DosageConvertor (version 1.0.4, https://genome.sph.umich.edu/wiki/DosageConvertor).
HLA imputation was performed using the Multi-Ethnic HLA reference panel (version 1.0 2021) available on the Michigan Imputation Server 46 using recommended settings. Phasing of multi-allelic HLA alleles was undertaken using PHASE (version 2.1.1) 47 . HLA typing was also performed using PCR sequence-specific primers (SSPs) at the Histogenetic laboratory, Oxford University NHS Foundation Trust.

Follow-up functional assays
Samples at baseline (day 0) and 28 days after boost (day 84) from 20 healthy adult volunteers participating in COV001 and COV002 who had received a two-dose ChAdOx1 nCoV-19 vaccine schedule of either two standard doses or one standard dose and one low dose were chosen for further functional work. Samples were evenly stratified by dose of second vaccine across individuals who either carried HLA-DQB1*06 (with no HLA-DQB1*05) or carried HLA-DQB1*05 (with no HLA-DQB1*06).
For the proliferation assay, cryopreserved PBMCs were washed twice with sterile DPBS and incubated with CellTrace Violet (Life Technologies), a free amine binding dye, at a final concentration of 2.5 μM in PBS for 10 minutes at room temperatuire, protected from light. To quench any dye remaining in solution, cells were incubated with FBS for 5 minutes at 4 °C. Cells were then resuspended in RPMI-1640 media supplemented with 1% L-glutamine, 1% penicillin-streptomycin and 10% human AB serum (Sigma-Aldrich), and 250,000 cells were seeded per condition in U-bottom 96-well tissue culture plates. Cells were stimulated with 15-mer peptides overlapping by ten AAs, comprising the length of the S1 or the S2 domain of the SARS-CoV-2 spike protein (ProImmune) at a final concentration of 1 μg ml −1 . Two μg ml −1 of PHA served as a positive control, and cells incubated with 0.13% DMSO (Sigma-Aldrich) in cell media were used as a negative control. Cells were then incubated for 7 days at 37 °C, with 5% CO 2 and 95% humidity, and the media was changed on day 4. At the end of the incubation period, cells were stained with anti-human CD3-FITC (OKT3, Invitrogen), CD4-APC (RPA-T4, Invitrogen), CD8-PE-Cyanine7 (OKT8, Invitrogen) and Live/Dead Fixable Near-IR Stain (Invitrogen) before acquisition on a five-laser LSRFortessa X-20 flow cytometer (BD Biosciences) using FACSDiva version 8.02 (BD Biosciences), and data were analyzed in FlowJo version 10.7. A hierarchical gating structure was applied, and data are presented as background subtracted.

Statistical analyses
Discovery cohort. Antibody responses were log-transformed and density distribution plots inspected for overlap in density distributions between laboratories (MSD or PPD) and paired correlation between assays using results available from both laboratories performed on samples from the same individual. If the paired correlation was less than 0.7, or the density distributions did not overlap, traits were tested for association within assay type and then meta-analyzed, but if the paired correlation was greater than 0.7, and the density distributions overlapped, traits were quintile normalized (using the qqnorm() function in R) within assay platform groups and tested in a pooled analysis including assay type as a fixed-effect covariate. Samples taken at day 28 after first dose were used for the primary analysis. A linear mixed model was used to maximize power and account for the diverse population structure and potential unrecognized close (cryptic) relatedness between study participants. Age, sex, antibody assay platform and N positivity were included as fixed-effect covariates for each association. Genotype and HLA-wide association analyses were performed using the 'mlma' function in GCTA (version 1.24.4) 51 after generating a GRM using non-pruned genotyped SNPs. The GRM was modeled as a random effect covariate with age (in years), sex, antibody assay laboratory and likelihood of natural exposure (based on anti-N protein antibodies (anti-N) > 1,000) coded as a binary variable included as fixed-effect covariates for the primary GWAS.
Sensitivity analyses for the genetic components included a further round of association analyses incorporating the first ten PCs for all individuals calculated within-cohort and a further round of normalization on the within-assay normalized RBD distributions.
Comparisons between imputed and classically typed HLA alleles were undertaken at the four-digit (that is, two-field) level of resolution. If a classically typed available call at a single allele locus included several potential higher-resolution alleles (that is, a list of potential ambiguities), only the first available allele call (adhering to a Common and Well-Documented priority) was used for comparison. If types were available to six-digit or eight-digit resolution, the calls were reduced to four-digit resolution for comparison between call types. The classical types were treated as the 'truth' set. By comparing each individual Nature Medicine Article https://doi.org/10.1038/s41591-022-02078-6 allele in turn, it was possible to define calls of the imputed (or 'test') set that were: • True positives (TPs) • False positives (FPs); called by the test as that allele when it was, in fact, another allele according to the truth • False negatives (FNs); called by the test as another allele when it was, in fact, this allele according to the truth • True negatives (TNs) Thus, at the level of an individual allele, various metrics could be calculated. Sensitivity was defined as: Specificity was defined as: Positive predictive value (PPV) was defined as: Negative predictive value (NPV) was defined as: Accuracy was defined as: Concordance was calculated at the level of the locus. For every pair of chromosomes with data available in both truth and test sets, the number of identical allele calls between platforms was calculated and divided by the total number of alleles, equivalent to the PPV. Any individual with missing alleles on either or both chromosomes on either platform was excluded from these calculations.
Univariate P values were calculated using the Wilcoxon rank test for continuous variables and the chi-squared test for nominal variables. Permutation of Wilcoxon rank test of associating HLA alleles with RBD response was performed using the 'perm' package in R 52 . Adjusted Cox proportional hazards regression analyses for the breakthrough analyses were performed using the 'survival' 53 package in R, and plots were generated using 'survminer' and 'ggplot2'. The breakthrough infection definition above for the discovery cohorts was used as the outcome in the proportional hazards model with individuals censored at time of breakthrough, date of withdrawal or 10 October 2021, whichever came first. The effect of carriage of HLA-DQB1*06 was estimated after adjustment for age (in years measured at baseline), healthcare worker status (defining whether individuals were healthcare workers and whether they cared for one or more than one patient with COVID-19 per week), BMI (less than 30 kg m −2 or equal to or greater than 30 kg m −2 ), ancestry (using the first two PCs of genetic variation to capture genetic structure) and chronic health condition (presence of none or one or more chronic respiratory, cardiovascular or diabetic health conditions). Using analyses undertaken on understanding the correlates of protection of ChAdOx1 nCoV-19 (ref. 13 ), we aimed to perform an identical modeling exercise, and, thus, samples were reweighted based on the interval between first and second vaccines (no second dose, <6, 6-8, 9-11 and ≥12 weeks) and dose arm of trial (standard dose (SD) alone, SD/SD, low dose (LD)/SD and SD/LD) distributions of individuals within the entire genotyped set with antibody data available using inverse probability weighting calculated using the 'ipw' package in R 54 . All analyses were undertaken in R version 4.1.1, except estimation of the genomic inflation factor (λ), which was undertaken in R version 3.6.2 using the GenABEL package 55 .
Replication cohort. Individuals from all three replication cohorts were stratified by HLA-DQB1*06 and DRB1-71E/R status and tested for association with log 10 -transformed anti-spike antibody levels measured after the first vaccine dose (and before the second vaccine) using linear regression adjusting for age, sex, self-reported ethnicity, priming vaccine, study and interval between prime and blood sample in days. Individuals were either analyzed together or stratified according to first vaccine dose received (ChAdOx1 nCoV-19 or BNT162b2). Survival analyses were performed using the same methods as for the discovery cohorts but including only age, sex and first vaccine type received as covariates, with reweighting performed using interval between first and second vaccines in days. Censoring was undertaken at point of breakthrough infection, withdrawal from study or date of database locking (21 January 2022 for COMCOV/COMCOV2 or 29 November 2021 for COV006), whichever came soonest. Healthcare worker status, BMI and chronic health condition information was not available for COV006, and so the variables were not included in the final replication analysis. All replication analyses were performed in R version 4.1.1.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The University of Oxford is committed to providing access to anonymized data for non-commercial research. Participant genotype and phenotype data will be deposited in the European Genome-phenome Archive and will be made available for non-commercial use only (accession number EGAS00001006909).  panels a-c), receptor binding domain (RBD, panels d-f) and nucleocapsid (N, panels g-i)) were measured for COV001 and COV002 participants in two laboratories: MSD and PPD. For 97 individuals, measures were available from both laboratories. The density distributions of measures from each laboratory for each antigen (panels a, d and g) and the correlation between intra-individual measures (panels b rho 0.98, e rho 0.98, and h rho 0.90) were inspected. If the density distributions overlapped and the intra-individual correlation (r 2 ) was greater than 0.7, the original distributions were taken forward for analysis (as for S, panel c). However if r 2 was less than 0.7, the quintile normalised intra-laboratory measures were taken forwards for analysis (panels f and i).