Main

There is wide variability in the outcome of exposure to SARS-CoV-2, ranging from severe illness to asymptomatic infection, to those individuals who remain negative according to standard diagnostic tests. Recent studies have identified SARS-CoV-2 T cell reactivity in prepandemic samples5,6,7,8,9,10,11,15,16,17,18 and isolated cases of exposed individuals who have not seroconverted with single-time-point screening4,16,19,20,21,22. We studied an intensively monitored cohort of HCWs with potential exposure during the first UK pandemic wave (23 March 2020), comparing those with or without PCR and/or antibody evidence of SARS-CoV-2 infection. We postulated that, in HCWs for whom PCR and the most sensitive binding and neutralizing antibody tests remained repeatedly negative (SN-HCWs), T cell assays might distinguish a subset of SN-HCWs with a subclinical, rapidly terminated (abortive) infection. We hypothesized that these individuals would exhibit pre-existing memory T cells with cross-reactive potential, obviating the time required for de novo T cell priming and clonal expansion. In SN-HCWs, and in an additionally recruited cohort of medical students and laboratory staff with stored prepandemic samples that remained seronegative after close contact with cases, we had the opportunity to compare SARS-CoV-2-specific memory T cells with those that were already present in the same individual before, or at the time of, potential exposure.

We included an analysis of the understudied T cells directed against the core RTC within open reading frame 1ab (ORF1ab) (RNA polymerase co-factor non-structural protein 7 (NSP7), RNA polymerase NSP12 and helicase NSP13, hereafter the RTC); these are putative targets for pre-existing responses with pan-Coronaviridae reactivity, because they are likely to be highly conserved due to their key early roles in the viral life cycle. Consistent with this, in cases in which immunity against other viruses (including hepatitis B virus (HBV), hepatitis C virus (HCV), HIV and Japaneses encephalitis virus (JEV)) has been described in exposed seronegative individuals, T cells were more likely to target non-structural proteins, such as polymerase, compared with in individuals with a seropositive infection23,24,25,26,27.

SARS-CoV-2 T cells in seronegative HCWs

We compared T cell reactivity in intensively monitored HCWs with a laboratory-confirmed infection or SN-HCWs, matched for exposure risk and demographic factors (COVIDsortium; Fig. 1a and Extended Data Table 1). Additional control cohorts included healthy adults who were sampled in London, UK, or Singapore before SARS-CoV-2 circulation in humans (prepandemic cohort; Fig. 1a). SN-HCWs were defined by negative weekly diagnostic tests (baseline–week 16, SARS-CoV-2 PCR, nasopharyngeal swab; anti-spike-1 IgG and anti-nucleoprotein (NP) IgG/IgM seroassays28; Fig. 1b–d). Having previously reported a range of neutralizing antibody titres at week 16 in laboratory-confirmed infections, we examined neutralizing antibodies in SN-HCWs. Two HCWs with neutralizing antibody titres that were just above the threshold were excluded from further analyses; the remaining SN-HCWs were negative by pseudotype assay (Fig. 1e), with a subset also confirmed to be negative at three time points for authentic virus neutralization (Extended Data Fig. 1a). SN-HCWs could have become PCR negative by recruitment; however, non-seroconverters after PCR positivity were rare (2.6% of PCR-positive HCWs negative by all three seroassays16) and antibody responses are unlikely to have waned before recruitment28. Furthermore, SN-HCWs lacked detectable SARS-CoV-2 spike-specific memory B cells, which we have shown persist after waning of neutralizing antibodies29 (Extended Data Fig. 1b; below the detection threshold). Thus, the SN-HCWs represented a cohort of intensely monitored HCWs who resisted classical laboratory-confirmed infection.

Fig. 1: SARS-CoV-2-specific T cells in SN-HCWs.
figure 1

a, Design of the HCW and prepandemic cohorts. nAb, neutralizing antibodies. b, Cycle threshold values for the E gene PCR analysis in SN-HCWs and HCWs with a laboratory (lab)-confirmed infection (undetectable at 40 cycles was assigned 41). c, d, Anti-spike S1 (c) and anti-NP antibody (d) titres in SN-HCWs (baseline to week 16; n = 58; dotted lines at assay positivity cut-off and at average peak (AvPos) response in laboratory-confirmed infection). e, Pseudovirus neutralization at week 16. The crossed circles represent individuals who were excluded from SN-HCW group (IC50 > 50). f, SARS-CoV-2 proteome highlighting RTC and structural regions assayed for T cell responses (peptide subpools are identified by the numbered boxes) and the number of overlapping 15-mer peptides (or mapped epitope peptides (MEP) for spike). gj, IFNγ ELISpot analyses. g, h, Viral proteins recognized by individuals coloured by specificity (g) and the number of viral proteins targeted by group (h). i, j, The magnitude of the T cell response coloured by viral protein (i) and the cumulative magnitude of the T cell response by group (j). The red bar shows the geometric mean. For e, h, the red bar shows the median. For h, j, statistical analysis was performed using Kruskal–Wallis tests with Dunn’s correction. M, membrane; SFCs, spot-forming cells. For be, gj, participants were from the COVIDsortium HCW cohort.

Source Data.

We quantified SARS-CoV-2-specific memory T cells by ELISpot using unbiased stimulation with overlapping peptides covering structural proteins and the less-well-studied non-structural proteins of the RTC (Fig. 1f). As previously described, when using sensitive assays5,6,7,9,17,18 (such as 400,000 peripheral blood mononuclear cells (PBMCs) per well IFNγ ELISpot analysis used here8,16), some SARS-CoV-2-reactive T cells were detectable in prepandemic samples; however, their multispecificity was significantly lower compared to the week 16 group with a laboratory-confirmed infection (Fig. 1g, h; structural responses at week 16 previously reported16). By contrast, SN-HCWs had SARS-CoV-2-specific T cells that were comparable in breadth to infected HCWs at week 16 and significantly more multispecific than prepandemic samples (Fig. 1g, h). The SARS-CoV-2-specific T cells of SN-HCWs targeted more protein pools and had an approximately fivefold higher cumulative magnitude of responses compared with those of the prepandemic cohort, with an overall strength equivalent to the infected cohort at week 16 (Fig. 1i, j).

T cells from prepandemic samples tended to not target both halves of NP (NP1 and NP2 subpools), whereas around 50% of SN-HCWs and HCWs with a laboratory-confirmed infection did, confirming our earlier suggestion8 that this serves as a simple proxy measure of a multispecific response (Extended Data Fig. 1c–e). Taken together, we found a higher magnitude and breadth of SARS-CoV-2-specific T cells in HCWs who repeatedly tested PCR and antibody negative compared with individuals in the prepandemic cohort.

RTC-specific T cells and IFI27 in SN-HCWs

We next investigated whether T cell memory differs in SN-HCWs versus HCWs with laboratory-confirmed infection. Anti-viral T cells recognizing influenza A, Epstein–Barr virus (EBV) and cytomegalovirus (CMV) (together, FEC) were equivalent between the three cohorts (Extended Data Fig. 2a). However, the relative immunodominance of T cells against SARS-CoV-2 structural versus RTC proteins differed between the groups. The laboratory-confirmed-infection group had more responses to structural proteins (spike, membrane, NP and ORF3a) than to RTC (NSP7, NSP12, NSP13) (Fig. 2a, b). Memory T cells against structural proteins tended to positively correlate with viral load, whereas RTC responses did not show this association (Extended Data Fig. 2b). By contrast, T cells of the SN-HCWs targeted both structural and RTC regions, with significantly more RTC-specific T cells compared with either the infected or prepandemic groups (Fig.2a and Extended Data Fig. 2c, d). Prepandemic samples had a ratio of RTC to structural responses that did not differ significantly from that in SN-HCWs (Fig. 2b), pointing to a possible influence of pre-existing responses on the pool of T cells expanding in SN-HCWs. A further small group (10%) of HCWs had PCR-confirmed infection but lacked detectable neutralizing antibodies at week 16, some of the individuals in this group also lacked binding antibodies; this subgroup was similarly enriched for RTC-reactive T cells (Extended Data Fig. 2e, f). Taken together, this suggests that the structural proteins, which are abundantly produced during active infection, are dominant T cell targets after mild infection, whereas T cells in SN-HCWs preferentially focus on the RTC.

Fig. 2: RTC-specific T cell and IFI27 signature in SN-HCWs.
figure 2

a, b, IFNγ ELISpot analysis at week 16. a, The magnitude of T cell response to structural regions and the RTC. b, The ratio of the T cell response to the RTC versus structural regions. The percentage of the cohort with a ratio above 1 (RTC > structural) is shown below. For a, b, the red bar shows the geometric mean. c, IFI27 transcript signal by reverse transcription PCR (RT–PCR) in unexposed prepandemic samples (n = 59), baseline (BL) samples in HCWs who remained PCR negative and seronegative throughout follow-up (n = 99), SN-HCWs with weak (n = 5, <50 SFCs per 106 PBMCs; Extended Data Fig. 4a) or strong (n = 15, >50 SFCs per 106 PBMCs) RTC-specific T cells (baseline and peak signal (weeks 0–5)), and HCWs at the time of PCR positivity (PCR+). d, The longitudinal IFI27 signal in SN-HCWs with weak or strong RTC-specific T cell responses (n values as in c). For c, d, the red bar shows the median, with 2 s.d. either side of the prepandemic cohort mean highlighted in grey; the percentage with raised IFI27 above the mean + 2 s.d. is indicated below. Statistical analysis was performed using Kruskal–Wallis analysis of variance (ANOVA) with Dunn’s correction (ad). Mann–Whitney paired t-test for paired BL versus peak (c). For ad, participants were from the COVIDsortium HCW cohort.

Source Data.

To confirm the T cell identity of ELISpot responses in SN-HCWs at week 16, we expanded them with RTC peptides and detected both CD4+ and CD8+ SARS-CoV-2-specific T cells dividing (CellTrace violet (CTV) dilution) and producing IFNγ (Extended Data Fig. 3a and Extended Data Table 2). Their post-expansion frequencies tended to be lower than control influenza A/EBV/CMV-specific responses in the same donors but proportional to their differing ex vivo frequencies, indicating comparable proliferative potential (Extended Data Fig. 3b). In vitro-expanded RTC-specific T cells in SN-HCWs were also highly functional, producing multiple cytokines in tandem (Extended Data Fig. 3c, d). Most of the SARS-CoV-2-specific T cells expanded from SN-HCWs were CD4+; however, CD8+ T cells were also detectable in the majority of individuals (Extended Data Fig. 3e).

Our data raised the possibility that SARS-CoV-2 infection in HCWs represents a spectrum, with some SN-HCWs expanding T cells as a result of a subclinical infection that was not detectable by PCR or antibody seroconversion. To test this postulate, we measured the interferon-inducible transcript IFI27 in the blood, which has recently been shown to detect SARS-CoV-2 infection at, or one week before, PCR positivity (specificity of 0.95 and sensitivity of 0.84)14. Of the 25% of SN-HCWs with the strongest post-exposure RTC-specific T cell responses (Extended Data Fig. 4a), 40% (that is, 10% of SN-HCW group) already had IFI27 levels at recruitment that were above the threshold set on the basis of a cohort of unexposed prepandemic samples, although their levels tended to be lower than those in individuals with a laboratory-confirmed infection (Fig. 2c). To further estimate the frequency of abortive infections we tested a larger cohort of 99 unselected SN-HCW baseline samples, and found that a comparable proportion (9.1%) had IFI27 induction above the prepandemic threshold (Fig. 2c). The IFI27 signal peaked above the prepandemic threshold in 93.3% of those with strong RTC-specific T cells over weeks 0–5, but in none with weak or undetectable RTC-specific T cells (Fig. 2c). IFI27 levels showed a cumulative increase, peaking at 3–5 weeks after the UK lockdown (23 March 2020) (Fig. 2d; by which time all of the first-wave laboratory-confirmed infections had occurred (Fig. 1b)). By contrast, IFI27 was unchanged over weeks 0–5 in SN-HCWs with weak or absent RTC-specific responses, resulting in a lower IFI27 slope and variance (Fig. 2d and Extended Data Fig. 4b, c). The peak IFI27 level correlated with NSP7 T cells at week 16, with the latter correlating more strongly with NSP12 and other RTC-specific responses compared with structural responses. Neither IFI27 or T cell specificity correlated with age, sex or other demographic factors, such as exposure type, in this small cohort (Extended Data Fig. 4d and Extended Data Table 1).

In summary, during a period of high transmission at the start of the first UK pandemic wave, a low-level systemic interferon response indicative of virus exposure was detectable selectively in individuals who had the strongest SARS-CoV-2-specific T cells after exposure, despite them lacking PCR or antibody confirmation of infection. Extrapolating from previous data showing that IFI27 is induced at the time of incident infection and correlates with viral load14, this is consistent with a low-level infection among SN-HCWs with stronger RTC-specific T cell responses.

Targeting of conserved RNA polymerase

A transient/abortive infection that is not detectable by PCR or seroconversion could conceivably result from a lower viral inoculum and/or from a more efficient innate and/or adaptive immune response. The latter would be favoured by pre-existing memory T cells with the potential to expand rapidly after cross-recognition of early viral products of SARS-CoV-2 replication. Early T cell proliferation and T-cell-receptor clonal expansion, even before the virus is detectable, has been observed during mild SARS-CoV-2 infection17,30 and expansion of virus-specific T cells predates antibody induction after mRNA vaccination2,31. Having found that the SN-HCW group is enriched for SARS-CoV-2-specific T cells, particularly against RTC, we investigated the possibility that some of these represented expansions of pre-existing cross-reactive responses.

Probable candidates for the source of pre-existing T cells that cross-recognize SARS-CoV-2 are previous infections with closely related human endemic common cold coronaviruses (α-HCoV 229E, NL63 and β-HCoV HKU1, OC43). We bioinformatically determined the sequence homology of all possible SARS-CoV-2-derived 15-mer peptides to a curated set of HCoV sequences (Supplementary Table 1). RTC proteins, which are expressed at the first stage of the SARS-CoV-2 life cycle13, had 15-mer sequences of high homology to HCoV32,33 (Fig. 3a). In particular, NSP7-, NSP12- and NSP13-derived 15-mers had 6.3%, 29.9% and 31.0% higher average sequence homology to the four HCoV species, respectively, compared with structural-protein-derived 15-mers (all P < 0.001; Fig. 3b). NSP12, which was the largest of these proteins, represented the region with the most homology overall among human-infecting Coronaviridae. We further assessed the diversity across global circulating SARS-CoV-2 sequences (13,785, representative subsample of 611,893 sequences, GISAID, 27 July 2021; Extended Data Fig. 5a) using Nei’s genetic diversity index and an estimate of the minimal number of independent mutational events (homoplasies) at any nucleotide. By both metrics, the RTC proteins NSP12 and NSP13 were among the most conserved across SARS-CoV-2 clades (Fig. 3c and Extended Data Fig. 5b, d) and were significantly more conserved than many structural proteins (Extended Data Table 3).

Fig. 3: Cross-reactive T cells targeting conserved RNA polymerase.
figure 3

a, Sequence homology of SARS-CoV-2-derived peptide sequences to HCoV sequences. The columns show 15-mer SARS-CoV-2-derived peptides. The rows show HCoV genome records. Cells are coloured by the level of homology of the 15-mer to a particular HCoV proteome. Cells with no fill indicate that a sequence homology of <40% was observed. b, The average sequence homology of 15-mers covering SARS-CoV-2 proteins, or regions (pink, structural (S, M, NP and ORF3a); black, RTC (NSP7, NSP12 and NSP13)), to HCoV sequences. Viral proteins that were not assayed for T cell responses are shown in grey. c, The nucleotide diversity along the SARS-CoV-2 genome estimated with Nei’s genetic diversity index across each viral protein for all SARS-CoV-2 clades (subsampling; Extended Data Fig. 5a). d, e, IFNγ ELISpot analysis of the magnitude of T cell responses to individual SARS-CoV-2 proteins in unexposed prepandemic samples (d) and SN-HCWs at week 16 (e). The frequency of responders is shown as doughnut charts above. The bar shows the geometric mean. ND, not done. Statistical analysis was performed using Kruskal–Wallis tests with Dunn’s correction. Participants were from the COVIDsortium HCW cohort.

Source Data.

Importantly, the highly conserved RNA polymerase (NSP12) was also the region among those tested in prepandemic samples that was most commonly targeted by T cells, with the highest average magnitude and frequency of responders (Fig. 3d). Notably, the same preferential targeting of NSP12 was observed in a geographically distinct cohort of prepandemic samples from Singapore (Fig. 3d). Pre-existing T cells had the potential to recognize all of the viral antigens tested, including those with less conservation across HCoV, as previously described5,7,17,34. Responses against these regions were further enriched in SN-HCWs (Fig. 3d, e; Mann–Whitney U-test, P < 0.0001 for all except for ORF3a (P = 0.0006) and NSP13 (P = 0.0003)), suggesting many sources of pre-existing and de novo responses contribute to T cell memory in exposed seronegative individuals. Despite potential demographic confounding factors between cohorts (Extended Data Table 1), as with prepandemic samples, T cells of SN-HCWs preferentially targeted NSP12 (Fig. 3e). Thus, the viral protein that is most commonly targeted by pre-existing T cells is also the largest conserved region between Coronaviridae, suggesting exposure to HCoV is one probable source of cross-reactive T cells.

To further examine the potential for cross-reactivity due to previous infection with seasonal HCoV, we mapped new and previously described6,8,18,35 RTC-specific CD4+ and CD8+ T cell epitopes in SN-HCWs, revealing high sequence conservation with HCoV (Extended Data Table 4 and Extended Data Fig. 6a, b). We identified cross-reactivity against the HLA-A*02:01 restricted epitope in NSP7. A subset of T cells co-stained with MHC class I pentamers loaded with SARS-CoV-2 and HKU1 sequence peptide ex vivo, and bound to SARS-CoV-2 peptide-loaded pentamer after expansion for 10 days with either peptide (Extended Data Fig. 6c). T cells from 3 out of 5 HLA-A*02:01+ SN-HCWs tested had stronger responses to the HKU1 sequence than to other seasonal HCoV or SARS-CoV-2 (Extended Data Fig. 6d, e). This suggested that previous HKU1 infection primed these NSP7 responses that are able to cross-recognize the SARS-CoV-2 sequence, albeit with reduced efficiency. HLA-B*35+ SN-HCWs also showed variable cross-recognition of seasonal HCoV variant sequences of an NSP12 epitope (Extended Data Fig. 6f).

An alternative explanation for expanded T cells with cross-reactive potential in SN-HCWs is an infection with a seasonal coronavirus during the first wave of SARS-CoV-2 infections in London. As expected, all HCWs had detectable anti-spike IgG against the four endemic HCoV and, as previously described36, spike-specific antibodies against betacoronavirus OC43 were increased in those with PCR-detectable infection and SARS-CoV-2-specific seroconversion (Extended Data Fig. 7). However, there was no difference in endemic HCoV titres in HCWs who had strong RTC-specific T cells and raised IFI27 compared with those with weak or absent RTC-specific responses (Extended Data Fig. 7), making it improbable that HCoV infection itself accounted for the SARS-CoV-2-reactive T cells that we detected in SN-HCWs.

In summary, RTC regions such as polymerase that are expressed in the first stage of the viral life cycle are highly conserved among HCoV and are preferentially targeted by T cells in prepandemic and SN-HCW samples. A subset of T cells from donors who were able to abort infection could cross-recognize SARS-CoV-2 and HCoV sequences at individual RTC epitopes, pointing to previous infection with HCoV as one source of pre-existing cross-reactive T cells.

Polymerase-specific T cells in abortive infection

To examine whether pre-existing cross-reactive and/or rapidly generated de novo RTC-specific T cells expand in vivo, we obtained paired PBMC samples before and after SARS-CoV-2 exposure. Medical students and laboratory staff (contact cohort, n = 23) who were sampled before the coronavirus disease 2019 (COVID-19) pandemic (winter 2018–2019), were resampled after close contact with individuals with SARS-CoV-2 infection, with or without IgG seroconversion and with or without PCR positivity (contact cohort; Extended Data Table 5). Parallel analysis of pre- and post-exposure/infection PBMCs demonstrated expansion of the RTC over structural responses in the close-contact seronegative group (Fig. 4a). By contrast, the group with serological confirmation of infection showed the expected in vivo expansion of pre-existing structural SARS-CoV-2-reactive T cells, with no significant increase in RTC-specific T cells (Fig. 4a and Extended Data Fig. 8a). We observed in vivo expansion of pre-existing NSP12 responses in 4 out of 5 individuals who remained seronegative after exposure to SARS-CoV-2, resulting in a significant increase in NSP12 but not control FEC responses (Extended Data Fig. 8b, c). Four out of five remaining seronegative close contacts had newly detected, presumed de novo, low-level responses after exposure (Extended Data Fig. 8c).

Fig. 4: In vivo expansion of polymerase-specific T cells in abortive infection.
figure 4

ae, IFNγ ELISpot analysis. a, The magnitude of the T cell response in seronegative individuals who had close contact with cases (green) or in seropositive individuals with infection (orange) to the RTC, structural proteins (Str), a summed total, and an influenza A, EBV and CMV (FEC) peptide pool (grey seronegative/seropositive combined), before and after exposure/infection. Data are mean ± s.e.m. P values are shown at the top. b, The change in magnitude of NSP12 T cell response between recruitment and post-exposure in SN-HCWs (subgroup with top 19 RTC responses at week 16; Extended Data Fig. 4a). Expanded, greater than twofold change. c, The magnitude of paired pre- and post-exposure T cell responses to individual 9–15-mer peptides (individual responses; Extended Data Fig. 8g) from RTC or the control FEC peptide pool in SN-HCWs (weeks 16–26, 11 responses from 9 SN-HCWs). CI, confidence intervals. d, The magnitude of the T cell response to individual SARS-CoV-2 proteins (top) and to subpools (~40 overlapping peptides; bottom) within the RTC at week 16 in HCWs with a laboratory-confirmed infection or SN-HCWs. e, Pre-existing NSP12-specific T cell responses in baseline samples from SN-HCWs and the laboratory-confirmed infection group (PCR positive after baseline or seroconversion at least 4 weeks after recruitment). The doughnut plot above shows frequency. For ce, the red lines (c, e) and bars (d) show the geometric mean. Statistical analysis was performed using Wilcoxon tests (a, c), Mann–Whitney U-test and Fisher’s exact test (d, e). For a, participants were from the contact cohort (Extended Data Table 5). For be, participants were from the COVIDsortium HCW cohort (Extended Data Table 1).

Source Data.

We next reverted to the SN-HCW group, in which small volume PBMC collections were available from the time of recruitment, enabling the targeted analysis of baseline T cells in those with the strongest RTC responses at week 16. NSP12-specific T cells were already detectable at the baseline in 79% of those SN-HCWs with the strongest NSP12 responses after exposure (Fig. 4b). NSP12 responses expanded in vivo on average 8.4-fold between recruitment and week 16, with no corresponding change in FEC responses (Fig. 4c and Extended Data Fig. 8d). We confirmed the expansion at week 16 of pre-existing RTC-specific T cells at the subpool (Extended Data Fig. 8e, f) and individual peptide (Fig. 4c and Extended Data Fig. 8g) levels. Moreover, many T cells were newly detected after exposure (Extended Data Fig. 8g), reflecting either de novo priming or expansion of responses that were previously below the limit of assay detection (example of expanded response undetectable by ex vivo ELISpot; Extended Data Fig. 8h). All of the HCWs with newly detected or expanding/contracting NSP12-specific T cells had both NP1- and NP2-reactive T cells after exposure (Extended Data Fig. 8i), whereas only 2 out of 5 individuals with no change in NSP12 had these specificities, suggesting that they may not have had the same level of SARS-CoV-2 exposure. The fold change in NSP12 between recruitment and the week 16 follow-up correlated with the total SARS-CoV-2 response, supporting its use to identify those seronegative individuals with expanded T cell immunity after exposure (Extended Data Fig. 8j).

Finally, we examined whether there was a preferential enrichment of RTC-specific responses in SN-HCWs compared with HCWs with a laboratory-confirmed infection at week 16. Notably, the RNA polymerase NSP12 and its cofactor NSP7 were the only proteins that induced higher-magnitude T cell responses in seronegative individuals in whom detectable infection was not established compared with those with overt infection (Fig. 4d). T cells in SN-HCWs targeted a larger number of regions of NSP12 (subpools of about 40 overlapping 15-mers; Fig. 1g) compared with T cells in the prepandemic or seropositive cohorts (Extended Data Fig. 8k). T cells targeting several regions of NSP12 and other RTC pools were enriched in SN-HCWs compared to HCWs with a laboratory-confirmed infection (Fig. 4d (bottom)). To examine whether the reduced frequency of NSP12-specific T cells in the 16 week memory response of those with laboratory-confirmed infection was reflective of their repertoire at the time of encountering SARS-CoV-2, we obtained baseline PBMCs from a subset of individuals who were sampled before PCR positivity or more than 4 weeks before seroconversion. NSP12-specific T cells were already significantly lower at the baseline in those who went on to develop laboratory-confirmed infection compared with in SN-HCWs (Fig. 4e and Extended Data Fig. 8l, m), supporting a potential role in protection from PCR-detectable infection and seroconversion.

Conclusions

We provide T cell and innate transcript evidence for abortive, seronegative SARS-CoV-2 infection. Longitudinal samples from SN-HCWs and an additional cohort showed that RTC-specific (particularly polymerase) T cells were enriched before exposure, expanded in vivo and preferentially accumulated in those in whom SARS-CoV-2 failed to establish infection compared with those with overt infection.

The differential biasing of T cells towards early-expressed non-structural SARS-CoV-2 proteins in HCWs without seroconversion may reflect repetitive occupational exposure to very low viral inocula, reported to drive the induction of non-structural T cells in HIV, simian immunodeficiency virus (SIV) and HBV26,37,38. Such repetitive exposure would be congruent with the observed protracted induction of the innate signal IFI27 and the development of de novo T cells in some SN-HCWs.

However, we also documented the expansion of pre-existing T cells, with responses that are capable of cross-recognizing epitope variants between seasonal HCoV and SARS-CoV-2. Cross-reactive SARS-CoV-2-specifc CD8+ T cells directed against epitopes that are highly conserved among HCoV are now well described, with pre-existing T cells frequently targeting essential viral proteins with low scope for tolerating mutational variation, such as those in ORF1ab6,18,32. The abundant SARS-CoV-2-specific CD4+ T cells may also contribute to protection in SN-HCWs by antibody-independent mechanisms, such as antiviral cytokine and chemokine production. HCWs have higher frequencies of HCoV-reactive T cells compared with the general public19, and recent HCoV infection is associated with a reduced risk of severe COVID-19 infection39, probably in part attributable to cross-reactive neutralizing antibodies40,41; however, pre-existing T cells have also been implicated15,42. The early induction of T cells, before detectable antibodies in mild infection30 and concurrent with mRNA vaccination efficacy, support a role for pre-existing cross-reactive memory T cells2,31.

Pre-existing RTC-specific T cells, at a higher frequency than naive T cells and poised for immediate reactivation on antigen cross-recognition, would be expected to favour early control, explaining their enrichment after abortive compared to classical infection. However, the relative contribution of viral inoculum and cross-reactive T cells needs to be further dissected in human challenge experiments and animal models. A caveat of this work is that we analysed only peripheral immunity; it is plausible that mucosal-sequestered antibodies43 had a role in our seronegative cohort. It also remains possible that innate immunity mediates control in abortive infections, with RTC-biased T cell responses being generated as a biomarker of low-grade infection. Interferon-independent induction of RIG-I has been proposed to abort SARS-CoV-2 infection by restraining the viral lifecycle before sgRNA production13; this would favour the presentation of epitopes from ORF1ab, released into the cytoplasm in the first stage of the viral life cycle12, while blocking the production of structural proteins from pregenomic RNA. This raises the possibility that some SARS-CoV-2-infected cells could be recognized and removed by ORF1ab-reactive T cells without widespread production of structural proteins and mature virion formation.

We have described the induction of innate and cellular immunity without seroconversion, highlighting a subset of individuals in whom the risk of SARS-CoV-2 reinfection and immunogenicity of vaccines should be specifically assessed. The HCWs who we studied were exposed to Wuhan Hu-1 and had partial protection from personal protective equipment; it remains to be seen whether abortive infections can occur after exposure to more infectious variants of concern, or in the presence of vaccine-induced immunity. However, clearance without seroconversion points to T cells that may be particularly effective vaccine targets. Cross-protection between coronaviruses is proportional to their sequence homology in mice44, making the highly conserved NSP12 region studied here, as well as less studied NSP3/14/16, top candidates for heterologous immunity. Our data highlight the presence of pre-existing T cells in a proportion of donors that are able to expand in vivo and target a highly conserved region of SARS-CoV-2 and other Coronaviridae. The boosting of such T cells may offer durable pan-Coronaviridae reactivity against endemic and emerging viruses, arguing for their inclusion and assessment as an adjunct to spike-specific antibodies in next-generation vaccines.

Methods

COVIDsortium healthcare worker participants

The COVIDsortium bioresource was approved by the ethical committee of UK National Research Ethics Service (20/SC/0149) and registered at https://ClinicalTrials.gov (NCT04318314). Full study details of the bioresource (participant screening, study design, sample collection and sample processing) have previously been described16,45.

In this cohort and London as a whole, infections peaked for the first pandemic wave of infections during the first week of lockdown (23 March 2020)46, and we observed approximately synchronous exposure coincident with recruitment; we therefore used this as the benchmark for assessing exposure-generated immunity. Across the main study cohort, 48 participants had positive RT–PCR results with 157 (21.5%) seropositive participants. Furthermore, 79% of positive PCR tests were within the first 2 weeks of follow-up and no HCWs tested PCR positive after week 5 of follow-up14,46 (Fig. 1b), with seroconversion within the first 3 weeks of follow-up for most28. Infections were asymptomatic or mild with only two hospital admissions (none requiring intensive care admission). The cross-sectional case controlled substudy (n = 129) collected samples at 16–18 weeks after the first UK lockdown (Fig. 1a). Power calculations were performed before week 16 substudy sampling to determine the sample size needed to test the hypothesis that HCWs with pre-existing T cell responses are enriched in exposed seronegative group at a range of incidence of infection, assuming 50% of the total cohort had pre-existing T cell responses. Sample sizes of 18–64 per group were estimated. An age-, sex- and ethnicity-matched nested substudy was designed within the larger (n = 731) parent study and 129 attended for 16 week sampling, including high-volume PBMC isolation.

Laboratory-confirmed infection was determined by weekly nasopharyngeal RNA stabilizing swabs and RT–PCR (Roche cobas SARS-CoV-2 test, envelope (E) gene) and antibody assay positivity (spike protein 1 IgG Ab assay, EUROIMMUN) and anti-nucleocapsid total antibody assay (Roche) described in detail below. The seronegative HCW group was matched for demographics and exposure to the laboratory-confirmed infected group and was defined by negativity in these 3 tests at all 16 time points as well as negative for neutralizing antibodies at week 16 and at selected prior time points as indicated.

The cohort of medical students and laboratory staff was approved by UCL Ethics (project ID:13545/001) and prepandemic samples from healthy donors were collected and cryopreserved before August 2019 under ethics number 11/LO/0421. All participants provided written informed consent and the study conformed to the principles of the Helsinki Declaration.

Isolation of PBMCs and serum

PBMCs were isolated from heparinized blood samples using Pancoll (Pan Biotech) or Histopaque-1077 Hybri-Max (Sigma-Aldrich) density-gradient centrifugation in SepMate tubes (StemCell) according to the manufacturer’s specifications. Isolated PBMCs were cryopreserved in fetal calf serum (FCS) containing 10% DMSO and stored in liquid nitrogen.

Whole-blood samples were collected in SST vacutainers (Vacuette) with inert polymer gel for serum separation and clot activator coating. After centrifugation at 1,000g for 10 min at room temperature, the serum layer was aliquoted and stored at −80 °C. All T cell assays reported here were performed on cryopreserved PBMCs.

Weekly SARS-CoV-2 S1 and NP serology

Weekly Euroimmun anti-SARS-CoV-2 enzyme-linked immunosorbent assay (ELISA; anti-SARS-CoV-2 S1 antigen IgG and the Roche Elecsys anti-SARS-CoV-2 electrochemiluminescence immunoassay (ECLIA; anti-SARS-CoV-2 nucleoprotein IgG/IgM) commercial assays were performed by Public Health England as previously described16. S1 ELISA: A ratio of ≥1.1 was deemed to be positive. A ratio of 11 was taken to be the upper threshold as the assay saturates beyond this point. NP ECLIA: anti-NP results are expressed as a cut-off index (COI) value based on the electrochemiluminescence signal of a two-point calibration, with results COI ≥ 1.0 classified as positive.

Neutralization assays for the pseudotype and authentic virus

SARS-CoV-2 pseudotype neutralization assays were conducted using pseudotyped lentiviral particles as previously described16. In brief, serum was heat-inactivated at 56 °C for 30 min. Serum dilutions in DMEM were performed in duplicate with a starting dilution of 1 in 20 and 7 consecutive twofold dilutions to a final dilution of 1/2,560 in a total volume of 100 µl. SARS-CoV-2 pseudotyped lentiviral particles (1 × 105 RLU) were added to each well (serum dilutions and controls) and incubated at 37 °C for 1 h. Then, 4 × 104 Huh7 cells suspended in 100 μl complete medium were added per well and incubated for 72 h at 37 °C and 5% CO2. Firefly luciferase activity (luminescence) was measured using the Steady-Glo Luciferase Assay System (Promega) and a CLARIOStar Plate Reader (BMG Labtech). The curves of relative infection rates (as a percentage) versus the serum dilutions (log10-transformed values) against a negative control of pooled sera collected before 2016 (Sigma-Aldrich) and a positive neutralizer were plotted using Prism 9 (GraphPad). A nonlinear regression method was used to determine the dilution fold that neutralized 50% (IC50).

Authentic SARS-CoV-2 microneutralization assays were performed as previously described47. In brief, a mixture of serum dilutions in DMEM (1 in 20 and 11 consecutive twofold dilutions to a final dilution of 1/40,960) and 3 × 104 FFU of SARS-CoV-2 virus (Wuhan Hu-1) were incubated at 37 °C for 1 h. After initial incubation, preseeded Vero E6 cells were infected with the serum–virus samples and incubated (37 °C and 5% CO2) for 72 h. Cells were then fixed with 100 μl 3.7% (v/v) formaldehyde for 1 h. Cells were washed with PBS and stained with 0.1% (w/v) crystal violet solution for 10 min. After removal of excess crystal violet and air drying, the crystal violet stain was resolubilized with 100 μl 1% (w/v) sodium dodecyl sulfate solution. Absorbance readings were taken at 570 nm using a CLARIOStar Plate Reader (BMG Labtech). Absorbance readings for each well were standardized against technical positive (virus control) and negative (cells only) controls on each plate to determine a percentage neutralization value. A nonlinear regression (curve fit) method was used to determine the dilution fold that neutralized 50% (IC50) using Prism 9 (GraphPad). SARS-CoV-2 is classified as a hazard group 3 pathogen and therefore all authentic SARS-CoV-2 propagation and microneutralization assays were performed in a containment level 3 facility.

Spike ELISA

Seropositivity against SARS-CoV-2 spike was determined for medical student and laboratory staff cohort between July 2020 and Jan 2021 (Extended Data Table 5) by ELISA, as validated and described previously40,48,49. In brief, 9 columns of 96-half-well MaxiSorp plates (Thermo Fisher Scientific) were coated overnight at 4 °C with purified S1 protein in PBS (3 μg ml−1 per well in 25 μl), the remaining 3 columns were coated with goat anti-human F(ab)′2 (1:1,000) to generate in internal standard curve. The next day, plates were washed with PBS-T (0.05% Tween-20 in PBS) and blocked for 1 h at room temperature with assay buffer (5% milk powder PBS-T). Sera were diluted in blocking buffer (1:50). Serum (25 μl) was then added to S1 coated wells in duplicate and incubated for 2 h at room temperature. Serial dilutions of known concentrations of IgG were added to the F(ab)′2 IgG-coated wells in triplicate (Sigma-Aldrich). After incubation for 2 h at room temperature, the plates were washed with PBS-T and 25 µl alkaline phosphatase-conjugated goat anti-human IgG (Jackson ImmunoResearch) at a 1:1,000 dilution in assay buffer added to each well and incubated for 1 h room temperature. Plates were then washed with PBS-T, and 25 µl of alkaline phosphatase substrate (Sigma-Aldrich) added. Optical density values were measured using a MultiskanFC (Thermo Fisher Scientific) plate reader at 405 nm and S1-specific IgG titres interpolated from the IgG standard curve using 4PL regression curve-fitting on GraphPad Prism 8.

HCoV spike meso-scale discovery immunoassay

A multiplexed meso-scale discovery immunoassay immunoassay to measure anti-HCoV spike IgG antibodies was performed as previously described50. Plates were coated with 200–400 μg ml−1 spike protein (trimers in prefusion form) from the endemic human coronaviruses HKU1, OC43, 229E and NL63. Antibody concentration is presented in arbitrary units (AU) interpolated from the ECL signal of the internal standard sample using a four-parameter logistic curve fit. Serum samples taken at week 8—the peak time point for spike S1 IgG after PCR-positive SARS-CoV-2 infection—were assayed for HCoV antibodies.

SARS-CoV-2 spike-specific memory B cell staining

Multiparameter flow cytometry was used for ex vivo identification of spike-specific memory B cells staining as previously described29. Biotinylated tetrameric spike (1 μg) was fluorochrome linked by incubating with streptavidin-conjugated APC (Prozyme) and PE (Prozyme) for 30 min in the dark on ice. PBMCs were thawed and incubated with Live/Dead fixable dead cell stain (UV, Thermo Fisher Scientific) and saturating concentrations of phenotyping monoclonal antibodies were diluted in 50% 1× PBS 50% Brilliant Violet Buffer (BD Biosciences): anti-CD3 Bv510 (BioLegend, OKT3, 1:200), anti-CD11c FITC (BD Biosciences, B-ly6, 1:100), anti-CD14 Bv510 (BioLegend, M5E2, 1:200), anti-CD19 Bv786 (BD Bioscience, HIB19, 1:50), anti-CD20 AlexFluor700 (BD Biosciences, 2H7, 1:100), anti-CD21 Bv711 (BD Biosciences, B-ly4, 1:100), anti-CD27 BUV395 (BD Biosciences, L128, 1:100), anti-CD38 Pe-CF594 (BD Biosciences, HIT2, 1:200), IgD Pe-Cy7 (BD Biosciences, IA6-2, 1:100). For identification of SARS-CoV-2-antigen-specific B cells, 1 μg per 500 μl of stain each of tetrameric spike–APC and spike–PE were added to cells. Cells were incubated in the staining solution for 30 min at room temperature, washed with PBS and subsequently fixed with the FoxP3 Buffer Set (BD Biosciences) according to the manufacturer’s instructions. All of the samples were acquired on the BD Fortessa-X20 flow cytometer. Data were analysed using FlowJo v.10.7 (TreeStar). Example gating and positivity cut-off have previously been reported29. The magnitude of the SARS-CoV-2 spike-specific memory B cell population is expressed as a percentage of memory B cells (gated as: lymphocytes, singlets, Live, CD3CD14CD19+, CD20+, excluding: CD38hi, IgD+ and CD27+CD21) binding both PE- and APC-labelled spike.

SARS-CoV-2 peptides

Full lists of the peptides contained in pools of overlapping peptides covering structural16 and RTC proteins8 have previously been described (15-mer peptides overlapping by 10 amino acids, GL Biochem Shanghai, >80% purity). A list of peptides that overlap NSP12 is provided in Supplementary Table 3. For IFNγ ELISpot assays, SARS-CoV sequence peptides were used (96.5% sequence homology with Wuhan SARS-CoV-2 consensus sequence, 34/931 amino acids differ; Supplementary Table 3). For epitope mapping, SARS-CoV-2 sequence peptides were used for NSP12-2 and NSP12-5 (GL Biochem Shanghai, >80% purity).

To limit competition for in vitro peptide presentation, we limited stimulations to a maximum of 55 peptides and have, therefore, divided large proteins such as NP into subpools: NP (NP1, NP2, 41 peptides each), M (43 peptides), ORF3a (53 peptides), NSP7 (15), NSP12 (36–37 per pool NSP12-1 to NSP12-5) and NSP13 (39–40 peptides per pool NSP13-1 to NSP13-3). Fifteen-mer peptides covering the predicted SARS-CoV-2 spike epitopes8 were used to give a total of 55 peptides in this pool (spike). Optimal 9-mer peptides for CD8+ epitopes were custom-synthesized by ThinkPeptides (>70% purity; Supplementary Table 3).

IFNγ ELISpot assay

The IFNγ ELISpot assay was performed as previously described on cryopreserved PBMCs8,16,51. Unless otherwise stated, culture medium for human PBMCs (R10) was sterile 0.22-μm-filtered RPMI medium (Thermo Fisher Scientific) supplemented with 10% by volume heat-inactivated (1 h, 64 °C) FCS (Hyclone) and 1% by volume 100× penicillin and streptomycin solution (Gibco-BRL).

ELISpot plates (Merck-Millipore, MSIP4510) were coated with human anti-IFNγ antibodies (1-D1K, Mabtech; 10 μg ml−1) in PBS overnight at 4 °C. The plates were washed six times with sterile PBS and blocked with R10 for 2 h at 37 °C with 5% CO2. PBMCs were thawed and rested in R10 for 3 h at 37 °C with 5% CO2 before being counted to ensure that only viable cells were included. PBMCs (400,000 per well) were seeded in R10 and were stimulated for 16–20 h with SARS-CoV-2 peptide pools (2 μg ml−1 per peptide) at 37 °C in a humidified atmosphere with 5% CO2. In cases in which insufficient T cells were available, NSP12 pools 1, 2 and 3, and NSP13 pools 1, 2 and 3 were combined into a single well. For baseline measurements, NSP12 pools 1–5 were stimulated in a single well and, in cases in which insufficient T cells were available, a single DMSO well was included. HCWs who did not have a full complement of stimulations were excluded from analysis of total magnitude of breadth of response, resulting in slightly lower n values. Internal plate controls were R10 alone (without T cells) and two DMSO wells (negative controls), concanavalin A (ConA, positive control; Sigma-Aldrich) and FEC (HLA I-restricted peptides from influenza, Epstein–Barr virus and CMV; 1 μg ml−1 per peptide). ELISpot plates were developed with human biotinylated IFNγ detection antibodies (7-B6-1, Mabtech; 1 μg ml−1) for 3 h at room temperature, followed by incubation with goat anti-biotin alkaline phosphatase (Vector Laboratories; 1:1,000) for 2 h at room temperature, both diluted in PBS with 0.5% BSA by volume (Sigma-Aldrich), and finally with 50 μl per well of sterile filtered BCIP/NBT Phosphatase Substrate (Thermo Fisher Scientific) for 7 min room temperature. Plates were washed in double-distilled H2O and left to dry overnight before being read on the AID classic ELISpot plate reader (Autoimmun Diagnostika).

The average of two DMSO wells was subtracted from all peptide-stimulated wells for a given PBMC sample and any response that was lower in magnitude than 2 s.d. of these sample specific DMSO control wells was not considered to be a peptide-specific response (given value 0). Results were expressed as IFNγ SFCs per 106 PBMCs after background subtraction. The geometric mean of all DMSO wells was 9.571 SFCs per 106 PBMCs (3.8 spots). We excluded the results if the negative control wells had >95 SFC per 106 PBMCs or positive control wells (ConA) were negative. T cell responses to SARS-CoV-2 did not correlate with background spots in DMSO wells (for example, the SN-HCW group, Spearman r = −0.068, P = 0.6141).

Antigen-specific T cell proliferation assay and epitope mapping

Frozen PBMCs were thawed and washed twice with sterile PBS. PBMC were resuspended in 1 ml R10 culture medium (2–10 × 106 PBMCs) and 0.5 µl of 5 mM stock CTV (Thermo Fisher Scientific) was added per sample with mixing. PBMCs were stained in the dark for 10 min at 37 °C in a humidified atmosphere with 5% CO2. Ten-times volume of cold R10 was added to stop the staining reaction, and cells were incubated for 5 min on ice. Cells were washed in PBS and incubated for 5 min at 37 °C before being transferred to a new tube and were washed again in R10. CTV stained PBMC were plated in 96-well plates (2–4 × 105 PBMCs in 200 µl R10) and stimulated with peptide pools (2 μg ml−1 per peptide) for 10 days in R10 supplemented with 0.5 μg ml−1 soluble anti-CD28 antibodies (Thermo Fisher scientific) and 20 U ml−1 recombinant human IL-2 (Peprotech). CTV-stained and unstained PBMCs were run to confirm efficiency of staining. Then, 100 µl medium was added on day 1, and 100 µl medium was removed and replaced with R10 supplemented with anti-CD28 and IL-2 as above on days 3 and 6. On day 9, PBMCs were restimulated with peptide pools (2 µg ml−1 per peptide) and brefeldin A (10 µg ml−1; Sigma-Aldrich). After 16–18 h restimulation, PBMCs were collected, washed in PBS and stained for fixable live/dead (Near infrared, Thermo Fisher Scientific, 1:1,000), washed in PBS, before being fixed in fix/perm buffer (TF staining buffer kit, eBioscience) for 20 min room temperature. Cells were washed in PBS and incubated in perm buffer (TF staining buffer kit, diluted 1:10 in double-distilled H2O) for 20 min room temperature, washed in PBS and resuspended in perm buffer with saturating concentrations of anti-human antibodies for intracellular staining: anti-IL-2 PerCp-eFluor710 (Invitrogen, MQ1-17H12, 1:50), anti-TNFα FITC (BD Bioscience, MAb11, 1:100), anti-CD8α BV785 (BioLegend, RPA-T8, 1:200), anti-IFNγ BV605 (BD Biosciences, B27, 1:100), anti-IFNγ APC (BioLegend, 4S.B3, 1:50), anti-CD3 BUV805 (BD Biosciences, UCHT1, 1:200), anti-CD4 BUV395 (BD Biosciences, SK3, 1:200), anti-CD154 (CD40L) Pe-Cy7 (BioLegend, 24-31, 1:50) and anti-MIP-1β PE (BD Biosciences, D21-1351, 1:100). Cells were washed twice in PBS and analysed using the BD LSRII flow cytometer. Cytometer voltages were consistent across batches. Fluorescence minus one (FMOs) and unstimulated samples were used to determine gates applied across samples. Data were analysed using FlowJo v.10.7 (TreeStar).

Optimization experiments showed that the use of recombinant human IL-2 increases non-peptide specific proliferation of T cells but is essential for optimal expansion of proliferating cytokine producing peptide-specific T cells. CTV dilution and staining with anti-human-IFNγ antibodies was used to identify antigen-specific T cells. An unstimulated control well (equivalent DMSO to peptide wells added) was included for each PBMC sample and the percentage of CTVloIFNγ+CD4+ or CD8+ cells proliferating in unstimulated wells was subtracted as background cytokine release from all peptide stimulated wells. The T cell proliferation assay above was used to expand SARS-CoV-2-specific T cells and a two-dimensional matrix (Supplementary Table 2) was used such that each 15-mer peptide was represented in 2 pools, aiding the identification of individuals immunogenic 15-mer peptides. T cell responses were then confirmed by repeated expansion with individual 15-mers.

Polyfunctionality, defined as the number of cytokines co-produced by T cells after expansion for 10 days, was assessed using SPICE (v.6.0) and pestle (v.2.0), available at GitHub (https://niaid.github.io/spice/)52. Responses <0.1% of CD4+ or CD8+ T cells were excluded. Boolean gating was used to identify the percentage of T cells making the 31 possible combinations of the following cytokines: IFNγ, TNF, IL-2, CD154, MIP-1β. Pestle was used to background-subtract the percentage of cytokine-producing cells from unstimulated wells that were run in parallel and to format data for visualization in SPICE. The proportion of T cells making a specific number of cytokines in combination is presented as pie graphs (base mean) and pie arcs represent the proportion making a given cytokine. The RTC-specific T cell polyfunctionality was calculated as an average over T cell responses to NSP7, NSP12 and NSP13 and the structural-specific T cell polyfunctionality is an average of responses to spike, ORF3a, M and NP (Extended Data Fig. 3d).

MHC class I pentamer staining

HLA-A*02-restricted pentamers (Proimmune) of the following specificities were used: SARS-CoV-2 NSP727–35 (KLWAQCVQL) or HCoV HKU1 NSP727–35 (KLWQYCSVL; ex vivo stains only). For post-expansion staining, antigen-specific T cells were expanded with a cognate 9-mer peptide of SARS-CoV-2 or HCoV HKUW sequence for 8–10 days as above (2 μg ml−1 per peptide) in R10 supplemented with 0.5 μg ml−1 soluble anti-CD28 antibodies and 20 U ml−1 recombinant human IL-2; medium was added on days 1, 3 and 6 before pentamer staining. For ex vivo staining, PBMCs were thawed, washed twice in PBS. Pentamers were centrifuged at 13,000 rpm. for 10 min before use. PBMCs (0.5–2 × 106) were stained with 1 μl pentamers at room temperature for 20 min in 50 μl PBS in a 96-well plate. PBMCs were further stained with Blue fixable Live/dead (Invitrogen, 1:1,000) for 20 min at 4 °C, and surface-stained with a mixture of saturating concentrations of monoclonal antibodies for 30 min at 4 °C: anti-CD3 BUV805 (BD Biosciences, UCHT1, 1:200), anti-CD4 BUV395 (BD Biosciences, SK3, 1:200), anti-CD56 Pe-Cy7 (BD Biosciences, NCAM16.2, 1:100), anti-CD8α Alexa700 (BioLegend, RPA-78, 1:200), post-expansion CD19 Bv786 (BD Biosciences, HIB19, 1:100). PBMCs were fixed with 1% paraformaldehyde and flow cytometry was performed as above using a BD LSRII flow cytometer. Data were analysed using FlowJo v.10.7 (TreeStar). During analysis, stringent gating criteria were applied (the gating strategy is shown in Extended Data Fig. 6c) with doublet, dead cell, CD19+ B cell (post-expansion) and CD56+ NK/NKT exclusion to minimize non-specific binding contamination. HLA-mismatched PBMC (non-HLA-A*02) and fluorescence minus one controls for pentamers were stained in parallel to assess non-specific binding (Extended Data Fig. 6c).

Coronaviridae family sequence homology analyses

The sequence homology of SARS-CoV-2-derived peptides to HCoV sequences was computed as previously described32. In brief, the SARS-CoV-2 proteome (NC_045512.2) was decomposed into 15-mer peptide sequences overlapping by 14 amino acids. A protein BLAST search of each 15-mer peptide was then performed against a custom sequence database comprising 2,531 Coronaviridae sequences32. Homology values of each SARS-CoV-2-derived peptide to viral accessions with ‘229E’, ‘OC43’, ‘NL63’ or ‘HKU1’ included in the species name and that were isolated from human hosts were retained (Supplementary Table 1). Moreover, to determine whether the conservation of 15-mer peptides differed between the SARS-CoV-2 proteins, the average homology of peptides within each protein was computed. A permutation test was conducted to test whether the difference in average homology between the two proteins, Δh, was statistically significant. In brief, the protein membership of each 15-mer peptide was permuted (1,000 iterations). The Δh of two proteins was then calculated at each iteration, resulting in a final null distribution of Δh values. P values were computed as the number of permutations that yielded a Δh at least as extreme as the observed Δh of the two proteins. Custom scripts used to perform the homology searches, heatmap visualization and permutation testing are available at GitHub (https://github.com/cednotsed/tcell_cross_reactivity_covid.git).

For sequence alignments of immunogenic 15-mers or at described MHC class I-restricted epitopes, reference protein sequences for ORF1ab (accession numbers: QHD43415.1, NP_828849.2, YP_009047202.1, YP_009555238.1, YP_173236.1, YP_003766.2 and NP_073549.1) were downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/protein/) as previously described8. Sequences were aligned using the MUSCLE algorithm with the default parameters and percentage identity was calculated in Geneious Prime 2020.1.2 (www.geneious.com). Alignment figures were generated using Snapgene 5.1 (GSL Biotech).

SARS-CoV-2 species genome diversity analyses

For genome diversity analysis, a complete masked alignment was downloaded from the GISAID53,54 EpiCoV database on 26 July 2021 together with a GISAID Audacity phylogeny comprising 611,893 accessions (a full list and metadata are available at Figshare (https://figshare.com/s/049d53f789a8b111b87e)). The alignment was subsampled to include 800 of each defined NextStrain phylogenetic clade, as provided by GISAID metadata. For clades containing less than 800 accessions, all representatives of that clade were included, resulting in a comprehensive sampling over the global phylogeny of 13,785 accessions encompassing the genomic diversity of SARS-CoV-2 to date (Extended Data Fig. 5a). Diversity along the genome was assessed using two metrics of diversity: the number of recurrent mutational emergences (homoplasies) at any position and Nei’s genetic diversity index55. Homoplasy counts per locus were computed through application of the HomoplasyFinder screening pipeline56 against a maximum likelihood phylogeny constructed over the 13,785-genome alignment. Nei’s genetic diversity index was computed as \({\rm{H}}=1-\mathop{\sum }\limits_{i=1}^{I}{p}_{i}^{2}\), where I is the count of distinct alleles at a position, and pi = (i=1,…,I) is the frequency of allele i in the studied alignment. The average homoplasy count per locus per gene region and average Nei’s genetic diversity per gene region were computed by normalizing the per-locus values to gene length for all ORFs and NSPs according to the reference annotations of GISAID reference genome EPI_ISL_402124. Significant differences between all pairwise combinations of ORF/NSP were assessed using the Wilcoxon rank-sum test implemented in compare_means() in the R package ggpubr v.0.4.0 (Extended Data Table 3).

IFI27 qPCR

Total RNA from Tempus blood was extracted using the Tempus Spin RNA isolation kit (Applied Biosystems, 4380204). cDNA was obtained using the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems). Quantitative PCR (qPCR) was performed using the TaqMan Fast Advanced Master Mix (Applied Biosystems) on the ABI StepOnePlus Real-Time PCR machine (Applied Biosystems). The following cycling conditions were used: 95 °C for 2 min; followed by 40 cycles of 95 °C for 3 s and 60 °C for 30 s. IFI27 and GAPDH were amplified using the TaqMan Gene Expression Assay probes Hs01086373_g1 (IFI27) and Hs02786624_g1 (GAPDH). GAPDH was used as a housekeeping gene control. The unexposed prepandemic control HCW cohort for qPCR analysis was described previously57

Correlogram plot

A pairwise correlation matrix between variables was calculated and visualized as a correlogram using corrplot (https://github.com/taiyun/corrplot) in R v.3.5.3 with R studio v.1.0.153. The Spearman’s rank correlation coefficient r is indicated by the size and colour of the circles. Only correlations with P < 0.05 are shown. Variables are ordered by hierarchical clustering.

Statistics and reproducibility

Data were assumed to have a non-Gaussian distribution and nonparametric tests were used throughout. For single-paired and unpaired comparisons, Wilcoxon matched-pairs signed-rank tests and a Mann-Whitney U-tests were used. For multiple unpaired comparisons, Kruskal–Wallis one-way ANOVA with Dunn’s correction was used. For correlations, Spearman’s r test was used. P < 0.05 was considered to be significant. Prism v.7.0e and v.8.0 for Mac was used for analysis. Details are provided in the figure legends.

Data reporting

Power calculations were used to estimate the sample size needed for the week 16 substudy (see above). No statistical methods were used to predetermine the sample size. For all of the assays, samples from each cohort were run in parallel to reduce the impact of interbatch technical variation. IFNγ ELISpot assays were performed on HCW cohorts before unblinding of the group (laboratory-confirmed infection or SN-HCW). Other experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.