A common allele of HLA is associated with asymptomatic SARS-CoV-2 infection

Studies have demonstrated that at least 20% of individuals infected with SARS-CoV-2 remain asymptomatic1–4. Although most global efforts have focused on severe illness in COVID-19, examining asymptomatic infection provides a unique opportunity to consider early immunological features that promote rapid viral clearance. Here, postulating that variation in the human leukocyte antigen (HLA) loci may underly processes mediating asymptomatic infection, we enrolled 29,947 individuals, for whom high-resolution HLA genotyping data were available, in a smartphone-based study designed to track COVID-19 symptoms and outcomes. Our discovery cohort (n = 1,428) comprised unvaccinated individuals who reported a positive test result for SARS-CoV-2. We tested for association of five HLA loci with disease course and identified a strong association between HLA-B*15:01 and asymptomatic infection, observed in two independent cohorts. Suggesting that this genetic association is due to pre-existing T cell immunity, we show that T cells from pre-pandemic samples from individuals carrying HLA-B*15:01 were reactive to the immunodominant SARS-CoV-2 S-derived peptide NQKLIANQF. The majority of the reactive T cells displayed a memory phenotype, were highly polyfunctional and were cross-reactive to a peptide derived from seasonal coronaviruses. The crystal structure of HLA-B*15:01–peptide complexes demonstrates that the peptides NQKLIANQF and NQKLIANAF (from OC43-CoV and HKU1-CoV) share a similar ability to be stabilized and presented by HLA-B*15:01. Finally, we show that the structural similarity of the peptides underpins T cell cross-reactivity of high-affinity public T cell receptors, providing the molecular basis for HLA-B*15:01-mediated pre-existing immunity.

Studies have demonstrated that at least 20% of individuals infected with SARS-CoV-2 remain asymptomatic [1][2][3][4] . Although most global efforts have focused on severe illness in COVID-19, examining asymptomatic infection provides a unique opportunity to consider early immunological features that promote rapid viral clearance. Here, postulating that variation in the human leukocyte antigen (HLA) loci may underly processes mediating asymptomatic infection, we enrolled 29,947 individuals, for whom high-resolution HLA genotyping data were available, in a smartphone-based study designed to track COVID-19 symptoms and outcomes. Our discovery cohort (n = 1,428) comprised unvaccinated individuals who reported a positive test result for SARS-CoV-2. We tested for association of five HLA loci with disease course and identified a strong association between HLA-B*15:01 and asymptomatic infection, observed in two independent cohorts. Suggesting that this genetic association is due to pre-existing T cell immunity, we show that T cells from pre-pandemic samples from individuals carrying HLA-B*15:01 were reactive to the immunodominant SARS-CoV-2 S-derived peptide NQKLIANQF. The majority of the reactive T cells displayed a memory phenotype, were highly polyfunctional and were cross-reactive to a peptide derived from seasonal coronaviruses. The crystal structure of HLA-B*15:01-peptide complexes demonstrates that the peptides NQKLIANQF and NQKLIANAF (from OC43-CoV and HKU1-CoV) share a similar ability to be stabilized and presented by HLA-B*15:01. Finally, we show that the structural similarity of the peptides underpins T cell cross-reactivity of high-affinity public T cell receptors, providing the molecular basis for HLA-B*15:01-mediated pre-existing immunity.
Despite some inconsistent reporting of symptoms 1 , studies have shown that at least 20% of individuals infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remain asymptomatic [2][3][4] . The examination of asymptomatic infection provides a unique opportunity to consider early disease and immunological features that promote rapid viral clearance. Specific focus on asymptomatic infection has the potential to further our understanding of disease pathogenesis and supports ongoing efforts towards vaccine development and the identification of potential therapeutic targets.
It remains unclear why many individuals successfully clear infection without major complications while others develop severe disease, even without known risk factors for severe COVID-19 outcomes 5 . However, host genetics is known to be implicated in differential immunological responses to infection and disease progression. Numerous studies intending to understand the genetic basis of differential outcomes in COVID-19 have been underway since nearly the start of the global pandemic, including the multicentre Host Genetics Initiative 6 . However, the vast majority of these studies have examined genetic associations with severe disease course, in primarily hospitalized cohorts 7,8 . As a result, although most individuals infected with SARS-CoV-2 experience mild disease course or are entirely asymptomatic, very few studies have examined genetics in the context of non-hospitalized, prospective, community-based cohorts.
The human leukocyte antigen (HLA) region (6p21) is the most polymorphic and medically important human genomic region. Variation in HLA has been associated with hundreds of diseases and conditions, including infection. Among the many genes involved in human immune responses, HLA variants have among the strongest associations with Nature | Vol 620 | 3 August 2023 | 129 viral infections. For example, HLA is associated with rapid progression and viral load control of human immunodeficiency virus (HIV) 9 , hepatitis B, hepatitis C and other infectious diseases 10 . Notably, HLA class I and class II alleles have also been associated with the severe acute respiratory syndrome caused by SARS-CoV [11][12][13] .
In silico analyses have pointed to HLA as relevant molecules for SARS-CoV-2 risk and essential targets for vaccine development [14][15][16][17] . For example, HLA-B*46:01 has low predicted binding to peptides of SARS-CoV-2, suggesting that individuals expressing this molecule may be more vulnerable to COVID-19 16 , corroborating previous results showing HLA-B*46:01 association with SARS risk 12 . By contrast, HLA-B*15:03 was predicted to protect against COVID-19 by presenting highly conserved SARS-CoV-2 peptides to T cells 16 . More recently, it was demonstrated that, although there is some overlap, many SARS-CoV-2 epitopes for CD8 + T cells are HLA specific 16 . To date, relatively few studies have directly examined HLA associations with infection, with mixed and inconclusive results in relatively small cohorts [18][19][20] . Larger studies that relied on genome-wide data to impute HLA did not find robust associations with disease 7,21 ; however, these studies focused primarily on hospitalized patients with a severe disease course.
Understanding the impact of HLA variation in disease promises to provide meaningful insights that are relevant to understanding the immunopathogenesis of COVID-19, while informing vaccine development and potential immunotherapies. Here we present a large study directly examining HLA variation in the context of primarily mild disease. We invited volunteer bone marrow donors, from whom high-resolution HLA genotyping data were already available, to participate in the COVID-19 Citizen Science Study-a smartphone-based study designed to track COVID-19 symptoms and outcomes, including self-reported positive tests for SARS-CoV-2 infection, to develop a prospective cohort currently comprising nearly 30,000 individuals, as well as two additional independent cohorts. We further contextualize our findings by examining T cell reactivity, T cell receptor repertoire, affinity and structural implications for the observed HLA associations. Our results provide strong support for the role of HLA class I in viral clearance leading to asymptomatic infection among individuals with SARS-CoV-2 infection and provide an important framework for additional studies aimed at revealing the immunological and genetic basis for recovery from SARS-CoV-2 infection.

HLA-B*15:01 in asymptomatic COVID-19
Our final cohort comprised 1,428 individuals who reported a positive test for active SARS-CoV-2 infection and self-identified as white. Basic demographics for all individuals are given in Extended Data Table 1. The full list of reported diseases and conditions and their frequency in this cohort is given in Supplementary Table 1. To identify whether HLA variation affects the likelihood of an individual remaining asymptomatic after SARS-CoV-2 infection, we analysed high-resolution genotyping data for five highly polymorphic HLA class I and class II genes (HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQB1).
To adjust for the effect of comorbid conditions, as well as sex and age differences, we fitted a series of regression models but did not find any effect of patient-reported comorbidities on the likelihood of asymptomatic disease. Thus, our final model was adjusted only for age and sex, again showing a significant association of HLA-B*15:01 with asymptomatic infection after adjustment for these variables (OR = 2.40, 95% CI = 1.54-3.64, P = 5.67 × 10 −5 , P adj = 0.003; Table 1).
Finally, we observed a strong additive effect for the associated genotype. Individuals carrying two copies of HLA-B*15:01 are more than eight times more likely to remain asymptomatic than individuals carrying other genotypes (OR = 8.58, 95% CI = 1.74-34.43, P = 0.001). Overall, one in five individuals (20%) who remained asymptomatic after infection carried HLA-B*15:01, compared with 9% among patients reporting symptoms.
After adjusting for sex and age, only the combination of HLA-B*15:01 and HLA-DRB1*04:01 remained significant after correction for multiple comparisons (P = 3 × 10 −4 , P adj = 0.01). We found an OR for this combination (OR = 3.17, 95% CI = 1. 65-5.80) that exceeds that for HLA-B*15:01 alone, suggesting that, although not significantly associated with the asymptomatic infection on its own in this cohort, the class II allele HLA-DRB1*04:01 enhances the effect of HLA-B*15:01.

Replication in independent cohorts
To confirm our findings, we examined two independent cohorts of patients with European ancestry. We first undertook a reanalysis of the primary HLA genotype data in a UK cohort that was previously reported 22 ; HLA-B*15:01 was not examined with respect to asymptomatic infection owing to its marginal significance in their analyses. Testing only the allele of interest, we found that HLA-B*15:01 is strongly Article associated with asymptomatic infection in this cohort when adjusting for sex and age (P = 0.02, OR = 3.56, 95% CI = 1. 15-10.94). Similar to our discovery cohort, we found that the carrier frequency for HLA-B*15:01 was 17% in asymptomatic individuals compared with 7% in symptomatic patients (Table 1). We next examined the association between HLA-B*15:01 and asymptomatic infection in the combined UCSF prospective longitudinal COVID-19 Host Immune Response Pathogenesis (CHIRP) and Long-term Impact of Infection with Novel Coronavirus (LIINC) cohort. Here, 12 out of 82 individuals with European ancestry were identified as having an asymptomatic disease course. We found again that the carrier frequency of HLA-B*15:01 was exceptionally high (25%) in asymptomatic individuals compared with in symptomatic patients (8.6%). Although the power was somewhat limited by the sample size, and therefore cannot be considered to be a fully independent replication, the findings are strongly trending in support of our finding of a strong association of this allele with asymptomatic disease (P = 0.13, OR = 3.44, 95% CI = 0.50-23.64; Table 1). Finally, a meta-analysis across all three datasets (Citizen Science, UK, CHIRP/LIINC) confirmed the strong and consistent association of HLA-B*15:01 with asymptomatic infection (P < 10 −4 , OR = 2.55, 95% CI = 1.73-3.77; Supplementary Fig. 1).

Unexposed T cells react to SARS-CoV-2
Owing to their high avidity for their cognate T cell receptors, pHLA (peptide-HLA) tetramers have been systematically used to visualize and quantify low-frequency antigen-specific T cells ex vivo using flow cytometry 23 . We focused on four SARS-CoV-2 epitopes (CVADYSVLY, HVGEIPVAY, NQKLIANQF and RVAGDSGFAAY) that were previously shown to elicit cellular immunity mediated by cytotoxic CD8 + T cells in patients with COVID-19 carrying HLA-B* 15:01 (refs. 24-28). We next performed ex vivo pHLA tetramer evaluation with the SARS-CoV-2 peptides to detect antigen-specific CD8 + T cells in nine pre-pandemic peripheral blood mononuclear cell (PBMC) samples. We observed tetramer + CD8 + T cells for three of the SARS-CoV-2 epitopes (Fig. 1a,b, Supplementary Tables 5 and 6 and Supplementary Figs. 2 and 3). NQKLIANQF (hereafter, NQK-Q8) was detectable in the highest proportion of samples (5 out of 9; 55.6%). Notably, in those donors, 100% of ex vivo NQK-Q8 tetramer + CD8 + T cells were memory T cells, indicating pre-existing T cell immunity against SARS-CoV-2 in a subset of individuals carrying HLA-B*15:01 who did not have any previous contact with the virus (Fig. 1c and Supplementary Table 5).
NQK-Q8 was previously identified as immunodominant 28 , and these authors also demonstrated that NQK-Q8-specific T cells from HLA-B*15:01 + patients with infection were cross-reactive to the highly homologous peptide NQKLIANAF (hereafter, NQK-A8) from the seasonal coronaviruses HKU1-CoV and OC43-CoV. Given the high level of NQK-Q8-specific memory T cells compared with naive CD8 + T cells observed in the initial set of pre-pandemic samples that we tested (Fig. 1c), we sought to determine whether the same phenotype was also observed for the peptide NQK-A8 in additional samples from HLA-B*15:01 + individuals with no exposure to SARS-CoV-2. We performed ex vivo tetramer magnetic enrichment (TAME) with tetramers of each peptide, NQK-Q8 and NQK-A8, bound to HLA-B*15:01 (tet-Q8 and tet-A8, respectively) ( Fig. 1d, Extended Data Figs. 1 and 2 and Supplementary  Fig. 4) using a larger number of additional samples (n = 7 and n = 5 for tet-Q8 and tet-A8, respectively). We used pre-pandemic samples from the USA and Australia according to our previous study 25 . All of these samples exhibited tetramer + CD8 + T cells. Overall, we observed NQK-Q8-specific T cells in 75% of HLA-B*15:01 + donors with no previous exposure to SARS-CoV-2 (n = 12 out of 16; Fig. 1a-d and Extended Data Fig. 1).
We next analysed the phenotype of the tetramer + CD8 + T cells for both NQK peptides (Fig. 1d). We observed an abundance of memory T cells specific for NQK-Q8 in pre-pandemic samples ( Fig. 1d and Extended Data Fig. 2). A similar phenotypic profile was observed for the NQK-A8-specific T cells, with most tetramer + CD8 + T cells being memory cells ( Fig. 1d and Extended Data Fig. 2). The high proportion of effector memory and effector memory re-expressing CD45RA (T EMRA ) T cells (29% for NQK-Q8 and 31% for NQK-A8) indicates a potent T cell response towards these peptides, which is a desirable characteristic for protective immunity. We were able to detect ex vivo tetramer + CD8 + T cells for NQK-Q8 (n = 7) and NQK-A8 (n = 6) ( Fig. 1d and Extended Data Fig. 1). Overall, our data show the presence of CD8 + T cells specific for both NQK peptides, with a similar phenotype and magnitude in unexposed donors.
Nature | Vol 620 | 3 August 2023 | 131 in unexposed individuals. We set up T cell lines with each peptide separately using PBMCs from unexposed and unvaccinated donors (n = 5, in vitro). Each cell line was then restimulated with either peptide. CD8 + T cell recognition and activation were determined by tetramer staining and intracellular cytokine staining, respectively ( Fig. 2a and Supplementary Fig. 4). All of the cell lines were characterized by the presence of tetramer + CD8 + T cells for both peptides ( Supplementary Fig. 5), showing a bidirectional cross-reactivity whereby T cells can recognize both NQK peptides derived from the different viruses. The magnitude of tetramer + CD8 + T cells was different among donors and slightly higher for the cell lines generated with the NQK-A8 peptide (Fig. 2b).
We subsequently measured the level of T cell responsiveness to each peptide. T cells from all donors, apart from GR8, responded to the cognate peptide (Fig. 2a). In four out of five donors, a stronger response was observed in the NQK-A8-specific cell lines independent of the peptide (Fig. 2a). These findings were similar to what we previously observed for the N(105-113) peptides derived from SARS-CoV-2 and HKU1-CoV/ OC43-CoV, in the context of HLA-B*07:02, whereby the T cells were more strongly stimulated after presentation of the peptide derived from seasonal coronavirus compared with the SARS-CoV-2-derived peptide in unexposed donors 25 .
Notably, although the percentage of cytokine + CD8 + T cells was different between donors (around 0-8%; Fig. 2a), the level and profile of cytokines were comparable for each cell line between the two peptides. This observation suggests that both NQK peptides stimulate T cells at similar magnitudes, which may reflect a high level of T cell cross-reactivity. We next examined the functional profile of the CD8 + T cells, which corresponds to their ability to exhibit different effector functions (IFNγ, TNF, IL-2, MIP-1β, CD107a; Fig. 2c). A highly polyfunctional T cell response, with up to five functions expressed, was observed for both cell lines that were restimulated with either peptide. At least 40% (n = 2 out of 5) and up to 80% (n = 4 out of 5) of donors had T cells exhibiting all five functions in one or more cell line. An average of 2.5-4.7% of CD8 + T cells exhibited all five functions tested ( Fig. 2c and Extended Data Fig. 3).
In summary, the T cell response against NQK peptides from seasonal coronaviruses and SARS-CoV-2 in individuals never exposed to SARS-CoV-2 is highly cross-reactive and polyfunctional.

NQK-specific T cells express public TCRs
We next determined the T cell receptor (TCR) repertoire specific to both NQK-A8 and NQK-Q8 peptides. Using single-cell sorting of tetramer + CD8 + T cells and TCR sequencing, we obtained 456 productive clonotypic sequences from eight unexposed donors and one triple-vaccinated donor (Supplementary Table 7). The TCR repertoires of T cells specific to each or both NQK peptides (single or double tetramer + CD8 + T cells) were obtained from both T cell lines (in vitro) and unstimulated PBMCs (ex vivo) ( Fig. 3a Supplementary Fig. 6).

Fig. 2 | NQK-specific T cells are cross-reactive. a,
Total cytokine production by CD8 + T cells in NQK-A8-and NQK-Q8-specific T cell lines. Each peptidespecific T cell line was restimulated individually with its cognate peptide or the homologous peptide as indicated, and the cytokine response was measured by intracellular cytokine staining (n = 5 donors). Percentages of effector functions (IFNγ, TNF, IL-2, MIP-1β, CD107a) minus the no peptide control are reported. b, In vitro tetramer analysis for the NQK-Q8-and NQK-A8-specific T cell lines (n = 5 donors). The cell lines were tetramer stained with a single tet-A8 (orange bar) or tet-Q8 (purple bar) tetramer or both tetramers (green bar). The frequency of tetramer + CD8 + T cells is shown. Data are median ± interquartile range. Differences between two groups were compared using two-tailed unpaired t-tests. P < 0.05 was considered to be significant. NS, not significant. c, Polyfunctionality analysis of CD8 + NQK-peptide-specific T cells from five unexposed donors. The number of functions is shown on a scale from 5 (black) to 1 (white). Data are the relative frequency (%) of total cytokine + CD8 + T cells.
Data are mean ± s.e.m. Differences between two groups were determined using two-tailed unpaired t-tests. P < 0.05 was considered to be significant, and the result was not significant.
132 | Nature | Vol 620 | 3 August 2023 Article NQK-specific TCRs were TRBV7-2/7-8 + paired with TRAV21 or TRAV9-2 and were observed both ex vivo and in vitro in unexposed donors (Fig. 3c,d and Supplementary Table 8). These public and cross-reactive TCRs have also been isolated from donors who had recovered from COVID-19 and/or vaccinated donors previously 28 (Supplementary  Table 8). This finding shows the potential for a strong pre-existing immune memory pool of cross-reactive public TCRs specific to the NKQ peptides that is similar to that observed after infection. The phenotypic profiles of the clonotypes expressing the public TCRs ex vivo in unexposed donors were either stem memory or central memory T cells.
To summarize, the NQK-Q8-and NQK-A8-specific TCR repertoires are biased with the presence of public and cross-reactive TCRs in unexposed donors, comparable to observations in donors who have recovered from COVID-19 and/or vaccinated donors (Supplementary Figs. 6 and 7). These findings suggest that the existence of memory cross-reactive NQK-specific TCRs before infection could have a critical role in the immune response to SARS-CoV-2, contributing to asymptomatic infection in individuals carrying HLA-B*15:01.

NQK peptides share structure similarity
The HLA-B*15:01-restricted peptide NQK-Q8 is conserved among all SARS-CoV-2 variants, even the new XBB variant of Omicron, and differs by only one amino acid from the HKU1-CoV/OC43-CoV peptide ( Supplementary Fig. 8). Importantly, NQK-Q8-reactive T cells from HLA-B*15:01 + individuals infected with SARS-CoV-2 were previously shown to be cross-reactive to the homologous peptide from seasonal coronaviruses 28 . As we have shown the presence of pre-existing T cell immunity in pre-pandemic samples and cross-reactivity of the NQK peptide-specific T cells (Figs. 1 and 2), we sought to investigate whether    Table 9), indicating that the Gln>Ala amino acid change does not affect the overall stability of the pHLA. We crystallized each peptide in a complex with HLA-B*15:01 and solved their structures at high resolution (Fig. 4b, Extended Data Table 2 and Extended Data Fig. 6). Overall, NQK-Q8 adopted a canonical conformation within the antigen-binding cleft of the HLA-B*15:01 molecule 30 . The Gln at position 2 (P2) was deeply inserted into the B pocket of HLA-B*15:01, whereas the P9-Phe primary anchor residue bound to the F pocket. The central part was more mobile than the rest of the peptide (Extended Data Fig. 6). The NQK-Q8 peptide exposed to the solvent, and potentially to circulating T cells, five of its nine residues (P1-Asn, P4-Leu, P6-Ala, P5-Asn and P8-Gln). The NQK-A8 peptide bound similarly in the HLA-B*15:01 cleft ( Fig. 4b and Extended Data Fig. 6). The superposition of the two pHLA structures revealed very little difference between the two complexes, with a root mean squared deviation of 0.08 Å for the Cα atoms of the antigen-binding cleft (residues 1-180) and 0.12 Å for the peptides. Only some local rearrangement around the P8 of the peptide was observed with a shift of the Glu76 side chain to avoid steric clashes with the P8-Gln (Fig. 4b). This change, which is on the surface of the peptide, could affect T cell interaction and might change the TCR affinity. To test the effect of the P8 difference within the NQK peptides, we selected some representative TCRs to perform affinity measurements using surface plasmon resonance (SPR). We selected the three TRBV7-2 + public TCRs paired with either TRAV9-2 (D9A TCR) or TRAV21 (A6A and D5A TCRs) (Fig. 3c,d and Supplementary Table 8).
All of the TCRs tested were able to bind with high affinity (K d range, 6-20 μM) to both the NQK-A8 and NQK-Q8 peptides presented by the HLA-B*15:01 molecule (Fig. 3c,d, Extended Data Figs. 7 and 8 and Supplementary Table 10). We observed a slower dissociation rate for the two TRAV21 + TCRs (A6A and D5A TCRs) compared with the TRAV9 + D9A TCR (Extended Data Fig. 7). The TCRs also differ in their CDR3β loops.
In a previous study 28 , the cross-reactivity of two NQK-Q8-specific TCRs (TCR1 and TCR2) transfected into Jurkat cells was shown. The published TCR1 and TCR2 28 differ by only one residue in each CDR3 loop from A6A and D5A TCRs, respectively (Supplementary Table 8). This is reflected by the similar affinity of the A6A and D5A TCRs for the NQK-A8 and NQK-Q8 peptides observed here (Supplementary  Table 10). Overall, our data show that the similar conformation and ability to stabilize the HLA-B*15:01 molecule of the two NQK peptides underpin the similar affinity observed for the public TCRs. Moreover, the high affinity of these TCRs towards the NQK peptides might trigger a rapid expansion of these memory cells after SARS-CoV-2 infection.

Discussion
Understanding the biological underpinning of asymptomatic infection with SARS-CoV-2 has important implications for public health measures, vaccine design and therapeutic development. Here we provide evidence of a genetic basis and the mechanistic explanation underlying asymptomatic disease. Leveraging a large database and mobile technology in this crowd-sourced study, we reveal important insights into the immunogenetic underpinnings of asymptomatic SARS-CoV-2 infection. Our use of a mobile application and a pre-existing database for medical research enabled us to screen nearly 30,000 individuals who were previously genotyped for HLA for viral infection and disease course. We augment our findings of a strong HLA association with asymptomatic disease course in this unique cohort with functional and structural studies to support a model of pre-existing immunity to explain the observed HLA association.
We show that, among participants reporting a positive test result for SARS-CoV-2, HLA-B*15:01 is significantly associated with asymptomatic infection. We observed that individuals carrying this common allele (approximately 10% in individuals with European ancestry) are more than twice as likely to remain asymptomatic after SARS-CoV-2 infection compared with those who do not, and a notable effect of HLA-B*15:01 homozygosity increasing the chance of remaining asymptomatic by more than eight times. This suggests important features of early infection with SARS-CoV-2. Supporting the role of HLA-B*15:01 in mediating asymptomatic infection, we found a highly similar frequency distribution of this allele in asymptomatic versus symptomatic patients in two independent cohorts.
Despite a growing number of published studies, the role of HLA variation in COVID-19 has remained unclear, with no clear consensus in the literature to date and, notably, few studies examining asymptomatic infection as a primary phenotype 18  Article asymptomatic infection 22 uncovered clear evidence for the role of HLA-B*15:01 in asymptomatic disease, which was not reported in the initial study. Although the data in our discovery cohort did not corroborate the association for HLA-DRB1*04:01 alone, we did find that this allele enhanced the effect of HLA-B*15:01 when the pair were in combination. Note that this is the HLA-DRB1 allele that is most commonly associated with HLA-B*15:01 in individuals in the United States who self-identify as white 31 and it is therefore difficult to differentiate a real effect from one related to linkage disequilibrium between these loci unless directly tested. Similarly, another recent paper describing an association of HLA-DRB1 alleles with asymptomatic infection did not genotype for HLA-B 32 . Finally, two other large studies that used patient questionnaires regarding symptoms did not consider the mildest symptoms that are common in SARS-CoV-2 infection (for example, runny nose and scratchy throat), resulting in a much less stringent definition of asymptomatic infection than we considered here 33,34 .
Respiratory tract infections are a major public health concern, representing a substantial burden, particularly for young children and the elderly 35,36 . Four strains of seasonal coronaviruses (229E-CoV, NL63-CoV, OC43-CoV and HKU1-CoV) represent 15% to 30% of all respiratory tract infections every year 37 . Notably, previous studies have shown that T cells can cross-react to SARS-CoV-2 and seasonal coronavirus peptides, indicating that long-lasting T cell protective immunity can potentially limit the severity of COVID-19 25 . Moreover, a recent study 28 demonstrated T cell cross-reactivity to SARS-CoV-2 and seasonal coronaviruses for an HLA-B*15:01-restricted immunodominant epitope (NQK-Q8) in individuals who received two doses of the Pfizer-BioNTech BNT162b2 mRNA vaccine. To test the hypothesis that HLA-B*15:01 can mediate asymptomatic disease through pre-existing T cell immunity, we analysed immunodominant epitopes in T cells from human PBMCs from pre-pandemic healthy individuals. We observed that T cells from a subset of healthy donors carrying HLA-B*15:01 who were never exposed to SARS-CoV-2 were reactive to the SARS-CoV-2 peptide NQK-Q8, and most of the reactive cells displayed a memory phenotype. The sequence identity between SARS-CoV-2 and seasonal coronaviruses peptides, except for a single amino acid substitution, could explain the T cell cross-reactivity. However, a direct demonstration that peptides from SARS-CoV-2 and the seasonal coronaviruses OC43-CoV and HKU1-CoV are stable in the HLA-B*15:01 cleft was necessary to further corroborate our hypothesis.
Through our examination of the crystal structures of the HLA-B*15:01 molecule in the presence of each peptide, we demonstrated that both NQK-Q8 (SARS-CoV-2) and NQK-A8 (OC43-CoV and HKU1-CoV) spike peptides share a similar ability to stabilize the HLA-B*15:01 molecule, and are presented in a similar conformation by HLA-B*15:01, providing the molecular basis for T cell cross-reactivity and pre-existing immunity. This observation is in accordance with previous research in uninfected HLA-B*07:02 + individuals who are able to recognize the N(105-113) peptide derived from SARS-CoV-2 due to the presence of cross-reactive T cells recognizing the homologous N(105-113) peptide from OC43-CoV and HKU1-CoV 25 . Notably, this T cell cross-reactivity has been associated with less severe COVID-19 disease 38 .
Our data show that both seasonal and pandemic coronavirus-derived NQK peptides can lead to a highly polyfunctional T cell response in the context of HLA-B*15:01, with T cells able to exhibit different effector functions (IFNγ, TNF, IL-2, MIP-1β, CD107a). T cell polyfunctionality is critical as it can lead to superior viral-suppressive activity 39,40 ; it is linked with high-affinity TCRs that can detect low levels of antigens 39,41 , and it is predictive of protective immunity and vaccine efficacy [42][43][44] . The high level of polyfunctionality observed for the CD8 + T cells towards the NQK-Q8 peptide in unexposed individuals contrasts with the moderate polyfunctionality observed for the HLA-B*07:02-restricted SARS-CoV-2 N(105-113) peptide (SPRWYFYYL) that also shares a high sequence similarity with the OC43/HKU-1-CoV N(105-113) peptide (LPRWYFYYL) 25 . Although there was some biased TCR gene usage observed in HLA-B*07:02-N(105-113)-specific TCR repertoire, we did not observe the presence of public TCRs 25 . By contrast, we observed that HLA-B*15:01-NQK-specific TCR repertoire was characterized by the existence of public TCRs. The memory phenotype of the public TCRs in unexposed donors also strongly suggests that they might provide a protective advantage to HLA-B*15:01 + donors infected by SARS-CoV-2. The presence of a high level of memory, polyfunctional, high-affinity and public cross-reactive T cells in unexposed donors probably underpins the strong association between the allele HLA-B*15:01 and asymptomatic SARS-CoV-2 infection. The presence of cross-reactive public TCRs before infection, observed also after infection or vaccination 28 , could provide a fast and protective immune response in individuals carrying HLA-B*15:01. This characteristic could make HLA-B*15:01 a generally more protective allele than HLA-B*07:02, which has been described as potentially limiting the severity of COVID-19 disease 38 . Our results also show the importance of pre-existing immunity 25,28,45,46 giving rise to a memory pool of cross-reactive T cells ready to fight infection 23,37 .
Examination of T cells in patients with asymptomatic SARS-CoV-2 infection has suggested robust T cell responses similar to those with symptomatic disease 47 . Recent studies have shown that SARS-CoV-2-specific memory T cells are enriched at the site of infection compared with in the blood 48 . Thus, the low frequencies of memory T cells that we observed in the blood are probably an under-representation of the resident antigen-specific memory T cells in the respiratory tract that rapidly respond to antigen restimulation at the viral entry sites. Presumably, a pre-existing resident memory T cell population at viral entry sites can lead to a rapid viral clearance before the overt onset of symptoms. Furthermore, the finding that T cells in asymptomatic infection secrete higher quantities of IFNγ compared with those in symptomatic patients early in infection 44,47 supports a role for memory T cells at this stage 49 . Although the current literature is mixed regarding cross-reactive CD8 + T cells specific to SARS-CoV-2, this might be explained by HLA specificity 50,51 .
Together, our results strongly support the hypothesis that HLA-B*15:01 mediates asymptomatic COVID-19 disease through pre-existing T cell immunity due to previous exposure to HKU1-CoV and OC43-CoV. Notably, the NQK-Q8 peptide is conserved among SARS-CoV-2 variants; moreover, among all HLA-B15:01-restricted SARS-CoV-2-derived T cell epitopes reported in the Immune Epitope Database ( Supplementary Fig. 8), no other epitope exhibits high sequence identity across common coronaviruses, except for the replicase polyprotein 1ab peptide QLYLGGMSY. However, this last epitope has not been reported in the literature as immunodominant in patients positive for HLA-B*15:01 and SARS-CoV-2. On the basis of the limited data available regarding known HLA-B*15:01 epitopes in SARS-CoV-2 patients, NQK-Q8 remains the prime candidate peptide underlying any HLA-B*15:01-mediated T cell cross-immunity with seasonal coronaviruses.
One limitation of this study is that all of the testing results and symptoms in our discovery cohort are self-reported. We recognize that this may result in some margin of error in our results. However, we have previously validated this approach by verifying test results in a subset of the participants 52 . Similarly, we did not query some symptoms, notably those related to rash and simple nasal congestion (as opposed to runny nose, which we consider here), and did not consider individuals with only a single symptom report within our two-week window as symptomatic, which may have resulted in some individuals being categorized as asymptomatic when in fact they experienced mild symptoms. However, we incorporated an additional 'sanity check' into our classification of asymptomatic disease, where we considered the response to the survey question regarding their reasons to seek testing for SARS-CoV-2 infection. Thus, although our self-report methodology means that we cannot definitively state that our asymptomatic cohort was entirely free Nature | Vol 620 | 3 August 2023 | 135 of any symptoms (and in some cases may, rather, be considered mildly symptomatic), we feel confident that our classification of individuals as asymptomatic was generally robust. Importantly, we find a very consistent genetic association across the study and in two independent cohorts where asymptomatic disease was clinician-defined, pointing to a true biological feature.
Another limitation is that our association results are limited to individuals who self-identify as white. While our study cohorts were not well-powered in this regard, we find that our results for HLA-B*15:01 appear to trend similarly in Black individuals, although this result is less clear in Asian and Hispanic individuals (Supplementary Table 11). However, owing to the paucity of individuals combined with the lower frequency of this allele in some populations, it is impossible to conclude whether our results for HLA-B*15:01 association with asymptomatic disease are applicable in these ancestries. A final limitation is that we tested only four SARS-CoV-2 peptides in our ex vivo analysis. The search for additional candidate peptides will be facilitated as more studies analyse T cell reactivity in patients carrying HLA-B*15:01, similarly to a previous study 28 . However, we identified at least one SARS-CoV-2 peptide previously known to be immunodominant in SARS-CoV-2 infection that was reactive to memory T cells from HLA-B*15:01 + individuals collected before the pandemic.
In summary, we have demonstrated a strong and significant association of a common HLA class I allele, HLA-B*15:01, with asymptomatic infection with SARS-CoV-2. We demonstrated that HLA-B*15:01 + T cells from pre-pandemic samples were reactive to an immunodominant SARS-CoV-2 peptide that shares high sequence similarity with peptides from seasonal coronaviruses. We provided the molecular basis of T cell cross-reactivity by showing that HLA-B*15:01 can stabilize and present peptides from HKU1-CoV and OC43-CoV similarly to the immunodominant peptide from SARS-CoV-2. Moreover, we show that public clonotypes were cross-reactive, polyfunctional and able to recognize both NQK peptides with high affinity. Our results have important implications for understanding early infection and the mechanism underlying early viral clearance and may lay the groundwork for refinement of vaccine development and therapeutic options in early disease.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-023-06331-x. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Data collection for the discovery (Citizen Science) cohort
The study participants were volunteer bone marrow donors with valid e-mail addresses on file with the National Marrow Donor Program (NMDP) who were invited to participate in the study through an e-mail outreach campaign that began in July 2020. All individuals had within the NMDP database a pre-existing record for high-resolution HLA genotyping, typically for five loci (HLA-A, -B, -C, -DRB1 and -DQB1) 31 . Participants who opt in to the study are required to download a smartphone app and participate in the COVID-19 Citizen Science Study (launched using the Eureka Digital Research Platform; https://eureka.app.link/ covid19) or, as of January 2021, participate through the website (https:// covid19.eurekaplatform.org/). Once enrolled, the participants are asked to complete an initial 10 to 15 min survey about baseline demographics, their health history and daily habits. Follow-up daily questions specific to symptoms, weekly questions regarding testing and monthly questions regarding hospitalization for COVID-19 are delivered by push notification or text message on an ongoing basis and require 5 to 15 min per week. As of 30 April 2021, we enrolled 29,947 individuals, of whom 21,893 have completed their baseline survey (Supplementary Table 12). Participation in the UCSF Citizen Science study and linking to NMDP HLA data were approved by the Institutional Review Board for the University of California, San Francisco (IRB 17-21879 and IRB 20-30850, respectively). All of the participants provided written informed consent agreeing to research and publication of research results.
Within the mobile application, the survey respondents are asked during their initial baseline survey whether they have ever been tested for active infection and report the result (positive, negative, do not know) and the approximate number of weeks since the test. Thereafter, each week respondents are asked whether they were tested in the previous week, and to report the result. We considered anybody reporting a positive test for active infection as having been infected with SARS-CoV-2. Our cohort consisted of individuals reporting a positive test for virus up to 30 April 2021 before the implementation of widespread vaccination for the virus. We restricted the analysis to individuals who had self-identified as 'white' only due to insufficient numbers for analysis in the other groups (Supplementary Tables 13 and 14), allowing an analysis of 1,428 individuals. The inclusion criteria are provided in Supplementary Fig. 9.
Symptoms are self-reported at the baseline and in daily surveys. Within the baseline survey, the respondents are asked to report whether they had any of a list of symptoms (Supplementary Table 15) for 3 days or longer at any time since February 2020. These same symptoms are queried in each daily survey, where respondents are asked whether they experienced each symptom within the previous 24 h. Among those individuals, we considered those as asymptomatic who reported having had a positive test for active virus at the baseline, with a time since the test of longer than 2 weeks or who did not specify test dates, and who reported "None of the above" for all symptoms in the baseline survey. We also considered daily symptom reports for the 2 weeks after the baseline for respondents who reported a positive test for active infection at the baseline as having occurred within the previous 2 weeks. In these cases, we considered individuals asymptomatic if, in addition to reporting no symptoms at baseline, they did not report any single symptom two or more times within this time period. For individuals who did not report a positive test for active infection at the baseline, but subsequently reported a positive test on a weekly survey, we used the same criteria considering daily symptom reports for the period 2 weeks before and 2 weeks after the positive test report (Supplementary Fig. 10). To further confirm a lack of symptoms, we also considered the survey question "Why was the test for active COVID-19 infection (virus) performed?" (Supplementary Table 16). Individuals who otherwise did not report symptoms, but responded "I had symptoms concerning for COVID-19 infection" were categorized as symptomatic.

The CHIRP and LIINC (replication) cohorts
The study participants were enrolled in two UCSF-based prospective longitudinal cohorts: the CHIRP study and the LIINC study. The participants were identified through local clinical systems (UCSF Moffitt Hospital, San Francisco General Hospital, Kaiser, California Pacific Medical Center) as well as the San Francisco Department of Public Health. After confirmation of SARS-CoV-2 test results or exposure to determine eligibility, the participants were asked to sign a consent form, complete a baseline visit and schedule follow-up in-person visits. The CHIRP study included volunteers with positive PCR test documentation and/or symptom onset within the preceding 21 days. Asymptomatic disease was defined as having a confirmed positive PCR test with lack of any symptoms ("Did you have or are you still having any symptoms that you think are because of COVID-19?") at the baseline and follow-up visits. A total of five longitudinal samples were collected from participants with acute SARS-CoV-2 infection. The first sample was collected <31 days of symptom onset or <31 days from exposure to SARS-CoV-2 as a week 0 baseline visit. The remaining samples were collected at weeks 1, 3, 10 and 24. At each CHIRP visit, blood and nasopharyngeal swabs were collected. Optional sample collection included sputum, saliva, stool and urine. The LIINC study enrolled participants with previous SARS-CoV-2 infection confirmed on clinical nucleic acid amplification testing between 14 and 90 days after initial symptom onset. After written informed consent, clinical data and biospecimens were collected monthly for up to 4 months after initial symptom onset and then every 4 months thereafter. Biospecimens including blood and saliva were collected at each visit. CHIRP and LIINC used a harmonized set of case report forms to collect clinical data about demographics, medical history, the COVID-19 illness and post-acute symptoms. Clinical measurements collected during in-person CHIRP visits included complete blood count with differential, comprehensive metabolic panel, erythrocyte sedimentation rate, high-sensitivity C-reactive protein, d-dimer, lactate dehydrogenase and ferritin. All of the participants provided written informed consent agreeing to research and publication of research results, and the CHIRP and LIINC studies were approved by the Institutional Review Board for the University of California, San Francisco (IRB 20-30588 and 20-30479, respectively).
HLA genotyping in the CHIRP/LIINC cohort A total of 100 ng of high-quality DNA was fragmented using the Library Preparation Enzymatic Fragmentation Kit 2.0 (Twist Bioscience). Subsequently, the ends of the fragmented DNA were repaired, poly(A) tails were added and ligated through PCR to Illumina-compatible dual index adapters that were uniquely barcoded. After ligation, fragments were purified with a 0.8× ratio AMPure XP magnetic beads (Beckman Coulter), followed by dual-size selection (0.42× and 0.15× ratios) to select libraries of approximately 800 bp. Finally, libraries were amplified and purified with magnetic beads.
After fluorometric quantification, 30 ng of each sample was precisely pooled using ultrasonic acoustic energy, and the targeted capture was performed using the Twist Target Enrichment kit (Twist Bioscience). In brief, the volumes were reduced using magnetic beads, and the DNA libraries were bound to 1,394 biotinylated probes specific to the HLA region, covering all exons, introns and regulatory regions of HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DRA, HLA-DQB1, HLA-DQA1, HLA-DPB1 and HLA-DPA1. Fragments targeted by the probes were captured using Streptavidin magnetic beads and then amplified and purified. Enriched libraries were analysed using the BioAnalyzer (Agilent) and quantified by droplet digital PCR. Finally, enriched libraries were sequenced using the NovaSeq platform (Illumina) with a paired-end 150 bp sequencing protocol. After sequencing, data were analysed using HLA Explorer v.1.4 (Omixon) and AlloSeq Tx V471 (CareDx).

UK cohort reanalysis
We reanalysed the primary data from a previous study 22 that analysed HLA class I and class II genes in 147 individuals of European ancestry with known SARS-CoV-2 infection and a range of symptoms and 69 asymptomatic hospital workers. In the initial publication, HLA-B*15:01 was not directly tested for association with asymptomatic disease course. HLA genotyping methods, allele frequencies, demographics and clinical outcomes are as previously described.

HLA association analysis
In our discovery cohort, we examined the association of five HLA loci (HLA-A, -B, -C, -DRB1 and -DQB1) with asymptomatic versus symptomatic infection. Data analysis included the first two fields of the allele name as described in the HLA nomenclature, representing the complete molecule at polypeptide sequence resolution.
Initial testing for HLA associations was performed using the R package BIGDAWG 53 , which handles multiallelic HLA data to test for association at the haplotype, locus, allele and amino acid levels. We next used a generalized linear model using glm in the R (v.4.3) base package to consider the relevant covariates, including any reported comorbidities, sex and age, for alleles that were initially found to be associated with asymptomatic infection after correction for multiple testing. We corrected P values using the Bonferroni method 54 for the number of alleles tested at HLA-A, -B and -DRB1, which accounts for the strong linkage disequilibrium between some of the loci tested. For the replication cohorts, we tested only the allele of interest, using the generalized linear model framework as described. Meta-analysis of the results in our three cohorts was performed in R using the common effect model with the meta package (v.6.2-1) 55 .

Peptide synthesis
The four SARS-CoV-2 peptides (Fig. 1a-c and Supplementary Table 17) were synthesized by in vitro transcription and translation of oligonucleotides encoding each peptide using the PURExpress in vitro protein synthesis kit (New England BioLabs) as previously described 56 . Peptides NQK-Q8 (NQKLIANQF) and NKQ-A8 (NQKLIANAF) (used in Figs. 1d and 2-4) were synthesized using the fluoren-9-ylmethoxycarbonyl (Fmoc) method of solid-phase peptide synthesis. The process was performed using the Biotage Initiator+ Alstra automated peptide synthesizer and Wang resin (100-200 mesh, 1.24 mmol g −1 ), which was swollen in dimethylformamide (DMF) for 2 h before use. The Fmoc-amino acids and HCTU/HOBt/DIPEA (4 eq.) dissolved in DMF were then added sequentially to the resin and the coupling reactions were carried out with microwave heating at 60 °C for 20 min. The Fmoc protecting groups were removed using piperidine (20% in DMF) at room temperature for 20 min. The peptides were cleaved from the resin using a mixture of 95% trifluoroacetic acid (TFA) and 5% triisopropylsilane (TIPS) for 3 h. After evaporation of the TFA/TIPS mixture and precipitation with diethylether, the crude peptides were purified using reversed-phase high-performance liquid chromatography (HPLC) on the Shimadzu HPLC system fitted with two Shimadzu LC-20AD pumps, a SIL-20AHT autosampler, SPD-M20A photodiode array detector and a FRC-10A fraction collector and an Onyx Monolithic C18, 100 × 10 mm semipreparative column with a solvent system consisting of 0.1% TFA in water and acetonitrile over 30 min. The purified peptides were lyophilized and stored at −20 °C. The purity was confirmed to be >95% in each case by analytical HPLC and the structures were confirmed using high-resolution electrospray ionization mass spectrometry (Supplementary Fig. 11).

PBMCs
A total of 20 unexposed and 1 triple-vaccinated (VAC62) donors were recruited; all details are provided in Supplementary Table 18. PBMCs were separated from whole blood or buffy coats using density-gradient centrifugation. PBMCs were used fresh or were cryogenically stored until use. HLA-genotyped PBMCs from the USA were stored in the National Marrow Donor Program (NMDP)/Be The Match Research Sample Repository (ClinicalTrials.gov protocol NCT04920474) that had been collected from healthy donors before the start of the COVID-19 pandemic. All individuals consented to research and publication of research results and had been previously genotyped for HLA class I and class II. Ethics approval to undertake the research was obtained from the QIMR Berghofer Medical Research Institute Human Research Ethics Committee (P2282) and La Trobe University Human Research Ethics Committee (HEC21097). The HLA genotyping was performed by AlloSeq Tx17 (CareDx Pty) using AllType NGS high-resolution getyping on the IonTorrent NGS platform.

TAME
Each peptide was loaded on biotinylated HLA-B*15:01 (either custommade or purchased from easYmers immunAware) according to the manufacturer's instructions.
For the combinatorial tetramer staining that included all four SARS-CoV-2 peptides (Supplementary Table 17) related to Fig. 1a-c, peptide-loaded HLA-B*15:01 tetramers were generated using Streptavidin conjugated to PE, APC, PE-CF594 or BV421 according to the manufacturer's instructions. d-Biotin (500 μM) was subsequently added to each peptide-loaded tetramer and tetramers were pooled just before cell staining. Combinatorial tetramer staining was used to identify each epitope by unique combinations of dual fluorophores where at least one of the fluorophores contained PE (Supplementary Table 17). The frequencies of antigen-specific T cells were calculated as previously described 57 . In brief, an aliquot of PBMCs was used for cell surface staining and counting using 123count eBeads (Invitrogen). The remaining PBMCs were stained with the indicated tetramer pools and enriched using anti-PE magnetic microbeads (Miltenyi) over a magnetic column, cell-surface stained and counted as for pre-enrichment. CD8 + T cells were identified by gating live singlet CD8 + lymphocytes that were negative for CD4, CD14, CD16 and CD19 (Supplementary Fig. 2 and Supplementary Table 19). A stringent tetramer gating strategy was used whereby CD8 + T cells labelled with only two fluorophores were considered antigen specific. The memory status of tetramer-positive CD8 + T cells was determined by lack of CCR7 and CD45RA co-expression. Given the low numbers of cells available from donors, only tetramer + CD8 + T cells with frequencies of greater than 1 × 10 −4 per total CD8 + T cells were considered to be positive.

Single-cell multiplex PCR
Single-cell multiplex PCR was performed as previously described 59 . In brief, cDNA was generated using the VILO cDNA synthesis kit (Invitrogen) at 1/20 of the manufacturer's recommendations with 0.1% Triton X-100. Nested PCR comprising 40 α-and 27 β-chains was subsequently undertaken. PCR products were purified using ExoSAP (GE Healthcare) and were sequenced at AGRF. Sequences were analysed using FinchTV (Geospiza v.1.5.0) and IMGT software 60 . CDR3 sequences shown are all productive (no stop codons) 61 .
Analysis of the TCR clonotypic repertoire TRA and TRB sequences were analysed using the software suite from the International ImMunoGeneTics (IMGT) Information System 62 . The V(D)J gene nomenclature used is that of the IMGT database (www.imgt.org). Motifs enriched were identified using the MEME suite motif discovery software (v.5.5.2.) 63 . The MEME software chooses the width and number of occurrences of each motif automatically to minimize the E-value of the motif. Motifs were searched in discriminative mode and were represented as sequence logos, where the relative sizes of the letters indicate their frequencies in the sequence set, and the total height of the letters represents the information content of the position, in bits.

Statistical analysis
GraphPad Prism 9 (v.9.3) was used to perform statistical analysis. Statistical comparisons between groups were determined using one-way or two-way analysis of variance with correction for multiple comparisons. P < 0.05 was considered to be significant.

DSF analysis
Thermal stability assays were performed by DSF using the ViiA 7 realtime PCR system (Thermo Fisher Scientific), in which the HLA-B*15:01-YFP complex was heated from 25 to 95 °C at a rate of 1 °C min −1 in 0.5 °C steps. The excitation and emission channels were set to the TAMRA reporter (x3m3 filter) with excitation of ~550 nm and detection at ~587 nm. The experiment was performed at two concentrations of pHLA (5 μM and 10 μM) in duplicates. Each sample was dialysed in 10 mM Tris-HCl pH 8.0 (Fisher Bioreagents), 150 mM NaCl (Merck) and contained a final concentration of 10× SYPRO Orange Dye (Invitrogen). Fluorescence intensity data were normalized and plotted using Graph-Pad Prism 9 (v.9.3). The T m value for a pHLA is equal to the temperature at which 50% of maximum fluorescence intensity is reached, which is equal to approximately 50% of unfolded protein and is summarized in Supplementary Table 9.

Crystallization and structure determination
Crystals of HLA-B*15:01-peptide complexes were grown by hangingdrop vapour diffusion at 20 °C. The protein:reservoir drop ratio was 1:1, at a concentration of 3 mg ml −1 in 10 mM Tris-HCl pH 8 (Fisher Bioreagents), 150 mM NaCl (Merck). Crystals of HLA-B*15:01-NQK-Q8 were grown in 0.2 M sodium formate pH 7.0 and 20% (w/v) polyethylene glycol 3350; and, for HLA-B*15:01-NQK-A8, in 20% (w/v) polyethylene glycol 3350 and 2% (v/v) ethylene glycol. Protein crystals were soaked in a cryoprotectant solution containing mother liquor solution with 20% (v/v) ethylene glycol and then flash-frozen in liquid nitrogen. The data were collected on the MX2 beamline at the Australian Synchrotron, part of ANSTO, Australia 65 . The data were processed using XDS 66  Last updated by author(s): May 31, 2023 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection Data for COVID-19 testing and symptoms in the discovery cohort was collected through a smartphone app https://eureka.app.link/covid19/.
HLA genotyping from NGS data in the CHIRP/LIINC cohort was performed with commercial software packages: HLA Explorer V1.4 (Omixon, Budapest, Hungary) and AlloSeq Tx V471 (CareDx, Brisbane, USA). The crystallographic data were collected on the MX2 beamline at the Australian Synchrotron, part of ANSTO, Australia. The data were processed using XDS (version January 10, 2022) and the structures were determined by molecular replacement using the PHASER program (v2.8.3) from the CCP4 suite (v8.0.005) with a model of HLA-B*15:01 without the peptide (derived from PDB ID: 5TXS). Manual model building was conducted using COOT (v0.9.8.4) followed by refinement with BUSTER (version 1.2.14) and PHENIX (1.20.1-4487). The final models have been validated and deposited using the wwPDB OneDep System. Single-cell index sorting was performed on a BD FACSAria Fusion with BD FACSDiva Software v9.0. BD FACSymphony A3 was also used for cell acquisition with BD FACSDiva Software v9.0.

Data analysis
Initial testing for HLA associations was performed using the R package BIGDAWG V3.0.3 All additional association analysis was conducted in R using the base stats package as described in Methods, with the exception of the meta-analysis which used the R 'meta' package V 6.2-1. All molecular graphics representations were created using PyMOL V2.5. Post-acquisition analysis of T cells was performed using FlowJo software (v10).TRA and TRB sequences were analyzed with the software suite from the International ImMunoGeneTics (IMGT) Information System. The V(D)J gene nomenclature used is that of the IMGT database (www.imgt.org). Motifs enriched were identified with the MEME suite motif discovery software (v5.5.2.). TCR sequences were analyzed using FinchTV (Geospiza v1.5.0) and IMGT software. Additional data analysis for TCR was conducted using GraphPad Prism 9 (version 9.3).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.