Main

The importance of T cell immunity to influenza A virus (IAV) is supported by studies in animal models and humans1,2, and has received increasing attention because a CD8+ T cell–based vaccine against a conserved epitope potentially could provide broad protection despite viral antigenic shift and drift3. The antiviral CD8+ T cell response is initiated by interaction between clonally distributed TCRα and TCRβ heterodimers and viral peptide loaded on MHC-I. Genes that encode TCRs are assembled by recombination of TCRα variable (TRAV) or TCRβ variable (TRBV) gene segments that encode variable complementarity-determining CDR1 and CDR2 regions, with TCRα joining (TRAJ) (or TCRβ diversity/joining) gene segments that encode hypervariable CDR3 regions. The HLA-A2–M1 epitope, composed of M158–66 (M1), a nonameric peptide from the IAV matrix protein that is presented by the common human MHC-I allelic variant HLA-A2*01:01, is a highly conserved immunodominant epitope4,5,6 that is abundantly expressed in infected cells7. Previous studies of M1-specific CD8+ T cell response suggest that the TCRβ repertoire that responds to HLA-A2–M1 is highly biased toward the use of the gene TRBV19 (up to 98%)8,9,10, with a highly conserved CDR3β motif, xR98S99x8,9,11. TCRα bias is less dramatic, but preferential usage of TRAV27 and TRAJ42 gene segments has been reported8,9,12. As is the case with many viruses that infect hosts chronically or recurrently, IAV infection results in 'public' TCRs with identical or near-identical patterns of V-region, J-region, and junctional sequences among HLA-A2-matched but otherwise genetically unrelated individuals.

A crystal structure of HLA-A2–M1 bound to one of these canonical public TCRs (JM22) showed that most of the amino acid side chains of M1 were buried in the peptide-binding cleft of HLA-A2 (refs. 13,14). This 'featureless' HLA-A2–M1 complex was recognized mainly by residues from CDR1β, CDR2β and Arg98 of the CDR3β xR98S99x motif, which explains the biased selection of TRBV19 and the role of the conserved CDR3β motif, with few MHC or peptide contacts from TCRα side chains14. It has been suggested that featureless peptides (or those with less recognizable features) are more prone to TCR bias than peptides with easily recognized features, because of a dearth of available recognition modes15,16,17. Direct proof of this concept came from an elegant study18 in which the highly featured PA224 epitope from influenza acidic polymerase presented by H-2Db was mutated to a more featureless version, which induced a change from a diverse TCR repertoire to a more restricted one. Several studies have suggested that diverse TCR repertoires that recognize virulent viruses are correlated with efficient control of viral infection19,20,21 and reduced viral escape22. Thus there is a concern about restricted TCR repertoires because of the associated possible loss of protection by either clonal loss or viral escape mutation. In one study, the viral load of simian immunodeficiency virus was inversely correlated not with epitope-specific CD8+ T cell frequency, recruitment to target organ, multifunctionality, or inability to recognize mutated virus, but rather with the number of public TCR clonotypes23, which implies that knowledge about the size of the TCR repertoire may be a critical component of full understanding of efficient viral control. Despite the increasing availability of high-throughput TCR sequencing strategies24, the breadth of the TCR response to human viral infection has been studied in only a few cases at sequence25,26 or structural levels27,28,29, and no study has been reported that combines both aspects.

Here we systematically examined the HLA-A2–M1-restricted CD8+ T cell repertoire by carrying out a comprehensive analysis of the TCR repertoires of six healthy donors by using next-generation sequencing (NGS) to obtain unbiased information about TRBV and TRAV sequences (IMGT TCR gene nomenclature is used in the paper). We identified tremendous diversity, with many hundreds of unique clonotypes in each donor. We evaluated TCRα- and TCRβ-chain pairing patterns directly ex vivo via single-cell sequencing confirmed by functional analysis in T cells that carried recombinant TCRs. We identified a previously unnoticed public TCR that uses TRAV38/TRAJ52 and TRBV19/J1-2 genes and sequence motifs in both CDR3α and CDR3β beyond the 'xRSx' motif. In addition, we identified many noncanonical M1-specific TCRs with lower frequency in the HLA-A2–M1-specific CD8+ T cell population. X-ray crystal structures of two noncanonical TCRs revealed the structural basis for HLA-A2–M1-recognition without the xRSx motif, and identified unique pockets between the peptide and MHC that appear to be required for the recognition of this featureless epitope. In combination with previous work, this study now provides the most comprehensive look so far at the breadth of the TCR repertoire for a viral antigen, and the structural basis for understanding recognition by thousands of TCRs able to recognize this ubiquitous epitope.

Results

Diverse CD8+ T cell repertoire in HLA-A2–M1-specific response

We analyzed the TCR repertoires of tetramer-sorted CD8+ T cells from peripheral blood mononuclear cells (PBMCs) from six healthy adult donors. Because the average frequency of HLA-A2–M1-specific memory CD8+ T cells in the PBMCs of healthy individuals is less than 0.2% (refs. 9,30,31), we expanded the antigen-specific population in vitro by M1-peptide stimulation (Fig. 1a and Supplementary Fig. 1). We then used the HLA-A2–M1 dextramer to sort antigen-specific cells before isolating cDNA for NGS analysis of the TRAV and TRBV repertoires. Previous approaches that used 5′-RACE PCR12 or that used individual primers for each TRAV or TRBV gene and then subcloned32 were not designed to exhaustively sample the repertoire and on average identified 20–100 unique TCRα or TCRβ clonotypes specific to HLA-A2–M1 in any one donor, usually after examination of either the TRAV or the TRBV repertoire but not both11,12,32. By using a strategy that combined ex vivo expansion and NGS, we gained a greater appreciation of the complete M1-specific TCR repertoire. We obtained an average of 516 unique TCRα clonotypes (range: 209–1,037) and 432 unique TCRβ clonotypes (range: 150–975) in each individual (Fig. 1b,c and Supplementary Fig. 1b,c), for a total of 2,939 unique TCRα and 2,544 unique TCRβ sequences. These results suggest that previous methods underestimated the diversity of the HLA-A2–M1-specific TCR repertoire (TCRα and TCRβ diversity indices are shown in Supplementary Fig. 1j).

Figure 1: Diversity of CD8+ T cell repertoire: dominant usage of TRAV38, and CDR3α and CDR3β sequence motifs, in the HLA-A2–M1-specific response.
figure 1

(a) Schematic of the analysis workflow. CD8+ T cells isolated from donor PBMCs were amplified by M1-pulsed HLA-A2+ antigen-presenting cells, and cDNA of M1-dextramer+CD8+ T cells sorted by FACS was analyzed by NGS. (b,c) HLA-A2–M1-specific TCR repertoires from six healthy donors (185, 215, 240, 264, D085, and D105). The pie charts show the frequency of each TRAV (b) and TRBV (c) sequence in the total HLA-A2–M1-specific TCR repertoire. (d) Frequency of the two most common TRAV genes in the M1-specific TCRα repertoire. (e) Frequency of TRAJ gene usage among TRAV38-containing TCRα receptors in the six donors. TRAJ52 was used almost exclusively. (f) Average frequency of TRAV38-containing TCRs for various CDR3α lengths. (g) Left, average frequencies of different CDR3α lengths within the overall CD8+ T cell population recognizing HLA-A2–M1. Right, pie charts showing TRAV gene usage for TCRs with 15-mer CDR3α, and the major TRAV genes (Vα) from each donor. The numbers in the pie charts represent the percentage of TRAV38 among all M1-specific TCRs with 15-mer CDR3α. Amino acid compositions of 15-mer CDR3α in TRAV38-containing TCRs from six donors were analyzed and are depicted as a sequence logo above the pie charts. Solid lines above logos indicate germline-encoded sequences. Dotted lines indicate partially conserved sequence. (h) Average frequencies of different CDR3β lengths among the overall population of CD8+ T cells recognizing HLA-A2–M1 (left), and TRBV gene usage and sequence logos for TCRs with 10-mer and 11-mer CDR3β (right). Numbers in parentheses below pie charts represent the individual's frequency of either 10-mer or 11-mer clonotypes. In bh, TRAV and TRBV genes are abbreviated as Vα and Vβ, respectively. In fh, error bars represent the s.d. of the mean for the six donors. Source data are available online.

Source data

Interestingly, even with this much diversity, the HLA-A2–M1-specific CD8+ T cells from each of the donors predominantly used TRBV19, as observed in earlier studies8,9,10. TRBV19 frequencies ranged from 57.1% to 89.5% of all TCRβ sequences read, depending on the donor (Fig. 1c). In addition to the use of TRBV19, there was minor usage of other common TRBV genes that differed from donor to donor with no systematic usage patterns. In contrast to the highly restricted TRBV gene usage, many different TRAV genes were used in the M1-specific TCR response. Previous studies reported preferential usage of TRAV27 (refs. 8,9,12). We found that TRAV27 was used commonly in all six donors, but to different degrees, accounting for up to 47.8% of total TRAV gene sequences in donor 185 but as little as 2.8% in donor D105 (Fig. 1b). Previous studies with fewer sequences showed higher, biased usage of TRAV27 (49–75%)8,9,12. In this study, where we sampled 8,000-fold more sequences, TRAV27 accounted for only 16.8% of the HLA-A2–M1-specific TRAV repertoire on average. In addition to TRAV27, other TRAV genes were commonly detected, including TRAV12 (TRAV12-1, TRAV12-2, TRAV12-3), TRAV13 (TRAV13-1, TRAV13-2), TRAV25, TRAV29 (TRAV29DV5), and TRAV38 (TRAV38-1, TRAV38-2DV8) (Fig. 1b). TRAV38 in particular was found in all six donors, and it was the most abundant TRAV gene in some donors. Although use of this gene has been observed previously in the HLA-A2–M1 response12, its public usage and dominance in many donors was not previously appreciated. Together TRAV27 and TRAV38 accounted for 21–53% of the overall response, depending on the individual (Fig. 1d). Because of the potential for relative TRAV (or TRBV) sequence bias to be introduced during reverse-transcription and amplification steps, we repeated our analyses of relative abundance, using observed numbers of unique clonotypes instead of observed sequence frequencies (Supplementary Fig. 1b–d). In general, we observed similar patterns.

TRAV38 joins with TRAJ52 to form a novel CDR3α region

We examined TCRα repertoires that contained the newly identified dominant TRAV38 to assess TRAJ pairing, CDR3α length, and CDR3 sequence composition. TRAV38 mainly rearranged with TRAJ52 in all six donors (Fig. 1e), with paired frequencies from 83.8% to 100%. The length distribution of the TRAV38-containing CDR3α region was highly restricted, with 96% among the six donors encoding a 15-mer CDR3α (Fig. 1f). In general, the CDR3α length distribution for HLA-A2–M1-specific repertoires carrying any TRAV gene was much broader, with 10–15-mer CDR3α regions all represented (Fig. 1g). However, in the 15-mer group, TRAV38-TRAJ52 gene pairs were the most common, with frequencies ranging from 78.3% to 98.2%, except in donor 215 (42.8%). Sequence analysis of the HLA-A2–M1-specific TRAV38-TRAJ52 repertoires from all six donors revealed a new CDR3α motif: C92AΦx1x2x3AGGTSYGKLT F 108, where C92 is the second cysteine of the TRAV gene; F108 is the phenylalanine in the characteristic TRAJ gene motif 'FGxG'; Φ is an aromatic residue; and x1, x2, and x3 represent variable intervening sequences (Fig. 1g). This sequence-composition analysis showed that germline-encoded Phe94/Tyr94 and Ala98 are under selection pressure that may prevent codons for these amino acids from being trimmed off during rearrangement.

These results define a new CDR3α motif in HLA-A2–M1-specific TCRs that consists of exclusive paired TRAV38-TRAJ52 together with extremely restricted length and composition of CDR3α. As this motif constitutes most of the HLA-A2–M1-specific TCRs with 15-mer CDR3α, these particular features may be important in the recognition of HLA-A2–M1.

Two dominant CDR3β motifs of M1-specific TCRs

Two conserved TRBV19 CDR3β motifs, xR98S99x and xG98xY100, have been reported for HLA-A2–M1-specific TCRs with frequencies of 74–84% and 11–13%, respectively9,11. Our NGS analysis identified a strong length bias of CDR3β toward 11-mers at 68% (range: 43–88%) of the total TCRβ repertoire and 10-mers at 12% (range: 3–23%) (Fig. 1h). Sequence analysis showed that the canonical xR98S99x motif was restricted to 11-mer CDR3β, with less bias than previously reported9,11, and represented only 50% (range: 15–72%) of the total M1-specific TCRs. Surprisingly, a single motif (xG98xY100) constituted almost all (98% ± 2%) of the TRBV19 10-mers (Fig. 1h). These xG98xY100 TCRs represented 11.5% (range: 2.3–25.8%) of the total number of M1-specific TCRs. Our NGS analysis showed that TCRs with these common xR98S99x and xG98xY100 motifs dominated the M1-specific TRBV19 repertoire in all of the donors (Supplementary Fig. 1i). These results suggest that studies focused on just TRBV19 with lower numbers of sequences might overestimate the role of the xR98S99x motif while underestimating the truly polyclonal nature of the IAV-M1-specific TCR repertoire and the role of other motifs such as xG98xY100.

TRBV19 paired exclusively with TRAV38-TRAJ52

To obtain information about which TCRα paired with which TCRβ, we used a single-cell sequencing approach in which rearranged TCRα and TCRβ genes were amplified from individual HLA-A2–M1-dextramer+CD8+ T cells sorted directly ex vivo from PBMCs (Supplementary Fig. 2a). TCR genes from individual cells were sequenced via a previously described nested PCR strategy19. Of the 82 productive TCRβ genes sequenced, 33 TCRβ genes paired with multiple TCRα genes, 15 paired with nonproductive single TCRα genes, and 34 paired with productive single TCRα genes (Supplementary Table 1). TRAV and TRBV gene usage patterns (Supplementary Fig. 2b) generally were similar to those obtained by NGS of dextramer+ T cells from the same donor (D085). Of the 34 single TCRα-TCRβ gene pairs, 23 had TRBV19 containing the 11-mer xRSx motif (xR98A99x in some cases) paired with 9 different TRAV genes (group I; Supplementary Table 1). All of these TCRs used TRAJ42 or TRAJ37 gene segments.

Surprisingly, all of the TCRβ chains with the CDR3 10-mer xG98xY100 motif paired exclusively with the novel TCRα composed of the paired TRAV38-TRAJ52 genes and the novel 15-mer CDR3α motif (group II; Supplementary Table 1). The greater abundance of TRBV19 clones with the xR98S99x motif might be related to their ability to successfully pair with many different TRAV genes, in contrast to the stringent pairing requirement for those with xG98xY100 motifs. The structural basis for this unusual pairing selection is described below.

Featureless surface presented by HLA-A2–M1

HLA-A2–M1 has been considered a relatively featureless ligand13,33 because most M1 side chains are buried within the peptide-binding pocket of HLA-A2, in contrast to other MHC-I–peptide complexes such as HLA-A2–RT or HLA-A2–tax in which one or more peptide side chains are largely exposed (Supplementary Fig. 3a,b). The solvent-accessible area of M1 peptide bound to HLA-A2 is 248 Å2, the smallest among 107 structures of free peptide–HLA-A2 complexes deposited in the Protein Data Bank (PDB) (Supplementary Fig. 3c), although the total peptide buried surface area (BSA) reflecting MHC-peptide interaction is slightly above the average (Supplementary Fig. 3d). It has been suggested that TCRs specific for HLA-A2–M1 would necessarily be highly restricted because of the limited ways available to recognize a featureless peptide13. Because we observed a broad response to HLA-A2–M1 that included TCRs with different recognition motifs, we were interested in how the newly identified TCRs could specifically recognize the rather featureless M1 ligand.

Characterization of TCR proteins

To validate the TCR α/β-chain-pairing information from single-cell PCR, we assembled 13 representative full-length TCR α/β-chain genes via overlapping PCR and expressed them in TCRα/β-deficient Jurkat T cells expressing human CD8α (J76-CD8 cells). We selected seven TCRα/β pairs from group I that contained the canonical xRSx motif (LS02–LS06, LS08, LS13), three from group II that used the newly identified dominant public TCR encoded by TRAV38/TRAJ52/TRBV19/TRBJ1-2 (LS10, LS11, LS12), two from group III with unique 11-mer TRBV19 CDR3β sequences with aromatic amino acids in place of Arg98 in the xR98S99x motif (LS01, LS07), and one TCR from group IV (LS09) that represented the set of TCRs that lacked CDR3 homologies among different individuals (Supplementary Table 1). Each TCRα-TCRβ pair was expressed at the cell surface (Supplementary Fig. 4), each could bind HLA-A2–M1 dextramer (Fig. 2a), and each could initiate T cell signaling responses as measured by CD69 upregulation after stimulation with HLA-A2+ cells pulsed with M1 peptide but not with control peptides (Supplementary Fig. 4d). These data support the reliability of chain-pairing information from single-cell PCR and the functional competence of the resultant TCR proteins.

Figure 2: TCRα-TCRβ pairs bind HLA-A2–M1 and stimulate T cell signaling.
figure 2

(a) J76-CD8 cells transiently expressing each of 13 full-length TCRα-TCRβ pairs (LS01-LS13) stained with M1-HLA-A2 dextramer or the negative control BRLF1–HLA-A2 dextramer showed specific binding by cloned TCRαβ. TCR surface expression levels for these transfectants are shown in Supplementary Figure 4a, and T-cell-activation levels induced by peptide-pulsed HLA-A2+ antigen-presenting cells are shown in Supplementary Figure 4b. PE, phycoerythrin. (b) Representative TCRs (group I, JM22; group II, LS10; group III, LS01) showed dose-dependent HLA-A2–M1 tetramer (M1tet) binding. J76-CD8 cells stably expressing LS01, LS10, or JM22 were stained with increasing concentrations of M1 tetramer. The geometric mean of fluorescent intensities (MFI) of bound HLA-A2–M1 tetramer subtracted from values for empty-vector controls is plotted against increasing M1-tet concentration, with half-maximal binding concentrations (Kdapp) indicated. Data are shown as the average ± range for two independent experiments. TCR surface-expression levels are shown in Supplementary Figure 4d, and FACS plots are shown in Supplementary Figure 4c. (c) LS01 and LS10 recognized M1 peptide as efficiently as canonical TCRs (JM22 and LS06). J76-CD8 cells expressing TCRs were stimulated by T2 cells loaded with increasing amounts of M1 peptide, and surface expression of the activation marker CD69 was measured. EC50 values are shown. Data are shown as mean and s.d. of triplicate measurements. (d) Soluble LS01 and LS10 TCR proteins bound to immobilized HLA-A2–M1. Increasing concentrations of soluble LS01 and LS10 proteins were applied to immobilized HLA-A2–M1 in surface plasmon resonance experiments. Increased response units relative to the control channel (dRU) are plotted against the soluble TCR (sTCR) concentration. Equilibrium binding constants (Kd) from a fit to a single-site binding equation are shown, ± the s.d. of three independent experiments. The measured Kd for LS06 was 2.1 ± 0.2 μM.

We chose for further characterization LS01 and LS10 as representatives of groups II and III, respectively, and compared them to JM22, the prototypical canonical group I public TCR recognizing HLA-A2–M1 (ref. 9). We evaluated the relative HLA-A2–M1 binding activity of these TCRs after stable expression in J76-CD8 cells. We observed concentration-dependent tetramer binding for JM22, LS10, and LS01 (Supplementary Fig. 4b), with similar half-maximal effective concentrations (EC50) (Fig. 2b) but with different maximum binding levels, consistent with the respective TCR expression levels (Supplementary Fig. 4c). We compared the relative functional sensitivity of the TCRs in response to stimulation by peptide-pulsed antigen-presenting cells (Supplementary Fig. 4d). EC50 values for CD69 upregulation were similar for JM22, LS10, and LS01 (Fig. 2c). Finally, we prepared soluble TCR and MHC proteins and evaluated binding directly by surface plasmon resonance (Fig. 2d). Apparent Kd values for LS01 and LS10 (32 and 30 μM, respectively) were in the range previously observed for agonist MHC peptide34,35 but were somewhat weaker than reported previously for JM22 (5 μM)14. To validate the somewhat weaker activation response for JM22 despite its apparent higher affinity, we also evaluated the LS06 TCR (LS06 is almost identical to JM22, but with Thr97β in place of Ser97β at a position that does not contact any MHC-peptide residues), which exhibited an EC50 for CD69 upregulation that was essentially identical to that of JM22. Overall, LS01, LS10, and JM22 had similar binding and activation characteristics despite their use of different recognition motifs.

Crystal structures of noncanonical TCRs LS10 and LS01

To investigate how LS01 and LS10 TCRs recognize HLA-A2–M1 without the canonical CDR3β xRSx motif, and to discover why specific CDR3α sequences are required, we determined the X-ray crystal structures of these TCRs when bound to HLA-A2–M1. Data collection and refinement statistics are shown in Table 1, and representative omit maps are shown in Supplementary Figure 5. Except for a few disordered loops, the TCR, MHC, and peptide all were defined well by the available data, with clear electron density and good geometry in the final models (Online Methods). Overall, the structures of LS01 and LS10 bound to HLA-A2–M1 showed that they docked similarly to JM22 (Fig. 3). All three TCRs used a conventional diagonal binding mode, with crossing angles36 of 70–85°, and CDR3α and CDR3β loops centered over the peptide (Fig. 3c,d). The interaction between TCR and MHC–peptide buried similar amounts of surface area (Fig. 3b). Footprints of the TCRs on the MHC–peptide were broadly similar, with a greater contribution by TCRα for both LS01 and LS10, mostly because of CDR1α in LS01 and CDR3α in LS10 (Fig. 3d). The HLA-A2–M1 component of the LS01–HLA-A2–M1 complex is essentially identical to the structure of free unliganded HLA-A2–M1 (Fig. 3c), except for a few key residues such as Arg65MHC and Gln155MHC that are members of the 'restriction triad'35. Differences between the crystal structures and the roles of the various CDR loops and TCR, MHC, and peptide structural rearrangements are discussed in detail in sections that follow.

Table 1 Data collection and refinement statistics (molecular replacement)
Figure 3: Structural comparison of three TCRs docked onto HLA-A2–M1.
figure 3

(a) Ribbon diagrams of three TCRs bound to HLA-A2–M1. The common TRBV19 TCRβ chain paired with different TCRα chains and bound to M1 peptide presented by HLA-A2 heavy chain (MHC) and β-2 globulin (β2m). (b) BSAs at the interface between TCRs and HLA-A2–M1 plotted against the contributions of TCR α- and β-chains, M1 peptide, and MHC, color-coded as in a. The total BSA was 2,137, 2,231, and 1,838 Å2, respectively, for LS01, LS10, and JM22. (c) CDR3α and CDR3β loops orient over the M1 peptide with different interactions. Unliganded HLA-A2–M1 (PDB 2VLL) is shown at the far right. (d) Surface representations of HLA-A2, shown with footprints of TCRs color-coded by CDR loop. Locations of M1 peptide are outlined by dashed lines. Source data are available online.

Source data

LS10: a new mode of HLA-A2–M1 recognition

LS10 uses a novel mechanism to recognize HLA-A2–M1 that involves a conformational change in the M1 peptide induced by interactions with the long CDR3α loop encoded mostly by TRAJ52. In complex with LS10, M1 bound to HLA-A2 adopts a conformation that is different from those observed in free HLA-A2–M1 and in HLA-A2–M1 of the JM22-bound complex, with changes concentrated in the center of the peptide (Fig. 4). The Phe5-p side chain adopts a different rotamer and moves toward the MHC α2-helix, with the Cα atom and nearby main chain moving by 1 Å, and the side chain phenyl ring moving by almost 5 Å. These changes appear to be induced by interaction with Ala98α and Tyr103α from the LS10 CDR3α loop (Fig. 4b). These residues pack against the Phe5-p side chain, but they would clash severely if Phe5-p were to retain the original conformation as observed in the unliganded HLA-A2–M1 structure.

Figure 4: The LS10 TCR uses conserved 15-mer CDR3α and xGxY CDR3β motifs to select an M1 peptide conformation with Phe5-p occupying the notch between peptide and MHC.
figure 4

(a) Top and side views of HLA-A2–M1 structures before (red) and after (blue) LS10 ligation, showing that M1 undergoes substantial movement after TCR engagement (the dotted line indicates the HLA-A2 surface). (b) Tyr103α and Ala98α of CDR3α make close contacts with Phe5-p of M1 in the new conformation. (ce) Rearrangement of M1 peptide after interaction with LS10. Top views of HLA-A2–M1 are shown with MHC in gray and peptide in color before (c) and after (d,e) LS10 ligation. Notches between MHC and peptide (red circles) are filled with displaced Phe7-p (d) and CDR3α residues. pMHC, peptide–MHC complex. (e) Close packing among Tyr103α (CDR3α), Gly98β (CDR3β), and Phe5-p (M1). (f) Similar view as in e, but after JM22 binding. (g) CDR3β and CDR3α loops of LS10 near the α2-helix (white) and M1 peptide (blue). Hydrogen bonds are shown as dashed lines. (h) CDR3α of ligated LS10 adopts a structured configuration with two β-hairpins (blue dashed ovals). Trp94α and Thr107α make two hydrogen bonds (black dashed lines). Nonconserved residues (“xxx” in CAΦxxxAGGTSYGKLTF) are labeled in red. (i) Trp94α (blue) of CDR3α is surrounded by TRAV38-specific residues (green) and CDR3α (gray).

The motion of the Phe5-p side chain fills a notch in the unliganded structure lined by Phe5-p and Phe7-p from M1, and Ala150MHC, Val152MHC, and Gln155MHC from the HLA-A2 α2-helix (Fig. 4c). This notch is believed to play a key role in the recognition of HLA-A2–M1 by canonical TCRs, with the side chain of the conserved Arg98β from the xRSx motif being inserted into the notch, as observed in the JM22–HLA-A2–M1 complex (compare the region enclosed by the dotted circle in Fig. 4c and the yellow surface in Fig. 4f). The motion of Phe5-p into the notch opens up a shallow hydrophobic pocket (Fig. 4d). In the LS10 complex, this new pocket becomes occupied by Ala98α and Gly99α and is covered by the side chain of Tyr103α, all from CDR3α (Fig. 4e). In this region, CDR3α is closely apposed to the corresponding CDR3β loop from the other TCR subunit (Fig. 4g), and Gly98β from the CDR3β xG98xY100 motif lodges between the phenyl ring of the displaced Phe5-p and the phenyl ring of Tyr103α from CDR3α (Fig. 4e), leaving no room for a side chain at position 98 of CDR3β. The tight packing of these four residues (Ala98α, Tyr103α, Gly98β, and Phe5-p) helps explain the strict pairing and sequence requirements of the 15-mer CDR3α CAΦxxxA98GGTSY103GKLTF motif and the 10-mer CDR3β xG98xY100 motif. The side chain of Tyr100β, the other component of the xG98xY100 CDR3β motif, packs against Gln155MHC in the HLA-A2 α2-helix (Fig. 4g). Gln155MHC has been referred to previously as a 'gatekeeper'37, and it regulates access to the notch by Arg98β from the xR98Sx motif in canonical TCR recognition of HLA-A2–M1 (ref. 13).

We validated the importance of these interactions by mutagenesis as described in Supplementary Note 1. We also investigated the role of other residues in the long 15-mer CDR3α motif, including Gly99α, Gly100α, and Gly104α, which appear to be required for the formation of two hairpin loops in the CDR3α loop structure that allow Tyr103α to pack against Ala98α on the side of the CDR3α loop (Fig. 4h and Supplementary Note 1); Tyr94α encoded by the extreme 3′ end of TRAV38, which nestles into a hydrophobic pocket formed by TRAV38 residues from the framework region; and Thr107α from TRAJ52 (Fig. 4i and Supplementary Note 1), as well as features that stabilize the convoluted structure adopted by the long 15-residue TRAJ52 gene segment in the LS10 TCR as compared with other TCRs that use this gene segment (Supplementary Fig. 6 and Supplementary Note 1).

Overall, LS10 recognizes the relatively featureless M1 peptide by inducing a peptide conformational change that opens up a small pocket able to be accessed by residues on the sides of a long CDR3α loop, with the interaction dependent on TRAV-TRAJ gene pairing and the selection of a particular 10-mer CDR3β xGxY motif.

LS01: another solution to HLA-A2–M1 recognition

Among TRBV19-containing TCRs that recognize HLA-A2–M1, a substantial fraction with 11-residue-long CDR3β did not contain the predominant xRSx motif, and instead often encoded hydrophobic residues such as Phe, Tyr, or Leu in place of Arg98β. The frequencies of this type of TCR in the six donors investigated ranged from 1% to 4% of total M1-specific TCRs (Fig. 5a). Previous work on the canonical JM22 TCR revealed that Arg98β of the xR98S99x motif is critical for M1 interaction, and mutations of Arg98 to histidine or alanine are not tolerated14. We tested whether other amino acids could replace Arg98β of JM22 by substituting it with residues that retain charge, hydrogen bonding, or nonpolar characteristics of the arginine side chain. Mutated JM22 TCRs were expressed transiently in J76-CD8 cells and assessed for HLA-A2–M1-tetramer binding (Fig. 5b). Arg98β in JM22 was highly resistant to mutation, with Lys98, Gln98, Phe98, and Tyr98 substitutions all leading to complete loss of tetramer binding. These results raised the question of how group III TCRs like LS01, which have an aromatic group at position 98, are able to recognize HLA-A2–M1.

Figure 5: The LS01 TCR uses CDR3β Phe98 to occupy the notch between peptide and MHC with additional interactions from CDR1α, CDR3α, and CDR3β.
figure 5

(a) CDR3β-sequence-based comparison of LS01 and JM22. LS01 has Phe98β instead of conserved Arg98β. Frequencies of M1-specific TCRs with Phe and Tyr among total M1-specific TCRs with 11-mer CDR3β. (b) Substitution of Arg98β in the xRSx motif abolishes HLA-A2–M1 tetramer binding. The graph shows the relative mean fluorescence intensity of HLA-A2–M1 tetramer (M1tet) bound to JM22 variants. Data are shown as the average and range of two independent samples. (c) CDR3β of LS01 with nearby portions of M1 peptide and MHC α2-helix. Water molecules are shown as green spheres; hydrogen bonds are indicated by dashed lines. (d) CDR1α and CDR3α interactions in the LS01–HLA-A2–M1 interface. Tyr31α of CDR1α is inserted between CDR3α and the MHC α2-helix interacting with Asn95α, Thr94α, Asp93α, Ala158MHC, and Tyr159MHC mainly via van der Waals interaction. (e) Tyr31α and Asn95α are involved in a network of hydrogen bonds with HLA-A2–M1. Source data are available online.

Source data

In the crystal structure of LS01 bound to HLA-A2–M1, the CDR3β loop lies above and between the MHC α2-helix and the Phe5-p/Phe7-p region of M1 (Fig. 5c). Residues Ile97β and Phe98β make main-chain van der Waals contacts and water-mediated hydrogen bonds with the M1 main chain, and residues Gln100β and Arg101β make side chain contacts with MHC side chains (Fig. 5c). Prominently, Phe98β inserts its phenyl ring into the hydrophobic notch formed by Phe5-p and Phe7-p of the M1 peptide and the side of the MHC-I α2-helix between Val153MHC and the gatekeeper Gln155MHC (Fig. 5c). This is the same site as occupied by Arg98β of JM22 and by the displaced Phe5-p of LS10, as described above. In the unliganded HLA-A2–M1 structure, Gln155MHC adopts a rotameric conformation that partially blocks access to the hydrophobic notch. In the LS01-bound structure, residue Gln155MHC was displaced from its usual position in the unliganded structure by interactions with CDR3β residues Gln100β and Arg101β (Fig. 5c). These interactions allow access to the hydrophobic notch, which becomes occupied by the side chain of Phe98β. Mutagenesis experiments showed that Phe98β, Gln100β, and Arg101β were each essential for M1 recognition (Supplementary Fig. 6e). Considering that Phe98β, Gln100β, and Arg101β are all encoded by nontemplated sequences and that mutation of any of them almost completely abrogates M1 recognition, it is reasonable that TCRs with non-ArgSer motifs are found at lower frequencies in the overall M1-specific TCR repertoire than are ArgSer-motif-containing TCRs for which only Arg98β is required in CDR3β (ref. 14).

We examined the structure of LS01–HLA-A2–M1 to understand the role of TCRα. The α-subunit of LS01 engages in extensive interactions with HLA-A2–M1, thus contacting both the peptide and the MHC α2-helix adjacent to the region contacted by CDR3β but closer to the peptide N terminus. Prominent contacts are made by Tyr31α from CDR1α and by Asn95α from CDR3α, which together insert into a cleft between the peptide and the MHC α2-helical region (Fig. 5d and Supplementary Note 1). Nearby, Asn29α forms hydrogen bonds with Glu166MHC of the α2-helix and with Thr94α of CDR3α, thereby providing additional stabilization to the intricate network of interactions in this region (Fig. 5e and Supplementary Note 1). The requirement for specific TCRα interactions from both germline-encoded CDR1α and CDR3α sequences and the CDR3β aromatic residue at position 98 could provide a clue about the infrequent usage by group III TCRs such as LS01 as compared with canonical xRSx-motif-containing TCRs.

TCR binding strategies for HLA-A2–M1 recognition

The LS01, LS10, and JM22 TCRs use different strategies to sense the notch between M1 and the MHC α2-helix near Phe5-p. All three TCRs use TRBV19, like most TCRs that recognize HLA-A2–M1. Although they thus share identical CDR1β and CDR2β sequences, there are subtle differences in hydrogen bonding, van der Waals contacts, and water usage (Fig. 6a), which result in different energetic contributions by CDR1β/CDR2β residues (Supplementary Note 1). Despite these differences, the overall location of CDR1β/CDR2β loops is preserved, in particular with Ile53β over the MHC peptide. Pivoting around this residue, the CDR3 loops of the three TCRs move into different locations, and thus recognize different aspects of the HLA-A2–M1 complex (Fig. 6b). LS01 and JM22 sense the common pocket with Phe98β and Arg98β of CDR3β, respectively. LS10, in contrast, recognizes the new notch originally taken by Phe5-p with Ala98α and Tyr103α of CDR3α. Cross-sectional views of the cleft between Phe5-p, Phe7-p, and the α2-helix (Fig. 6b) showed that three TCRs fill up this notch with different residues (Fig. 6c). In addition, interactions with gatekeeper Gln155MHC are distinct among three TCRs. In LS01, CDR3β residues Gln100β and Arg101β and the CDR1α Tyr31α flip the side chain of Gln155MHC toward the α2-helix, opening up the pocket adjacent to Phe7-p for Phe98β (Fig. 5c and Supplementary Fig. 6g). In JM22, the CDR3 Arg98β itself pushes away the gatekeeper to enter the cleft with the aid of Ser100β and the main chain of CDR3α. In LS10, Tyr103α, Tyr100β, and the main chain of CDR3β participate in stabilizing the gatekeeper to create a new cleft for Ala98α/Gly99α.

Figure 6: Different structural solutions to high-avidity binding of a featureless peptide.
figure 6

(a) Identical CDR1β and CDR2β sequences from three TRBV19-containing TCRs engage in different interactions with HLA-A2–M1. (b) Top views of three HLA-A2–M1–TCR complexes, showing the overall similarity and fine specificity of M1 recognition. TCR-ligated M1 peptide residues (yellow), Ile53β of CDR2β, critical residues from CDR3α or CDR3β, and Gln155MHC of MHC are displayed as surface/stick representations. (c) Sectional views of the three TCRs in the pocket region. The sectional planes are indicated by the dashed lines in b. (d) The percentage of TRBV19 TCRs for each donor with motifs from group I, II, or III. Numbers above bars indicate the total for the three groups. (e) Frequency of group I, II, and III TCRs plotted against the number of TCR residues that make side chain contacts with peptide–MHC in the corresponding crystal structure. Box and whisker plots represent the mean, first and third quartiles, and range of frequencies for five donors. The correlation coefficient R and the P value from nonparametric Pearson correlation analysis (two-tailed) are also shown. Source data are available online.

Source data

To further compare the interaction patterns of three TCRs with HLA-A2–M1, we derived contact maps that showed where TCRα and TCRβ residues contact peptide or MHC (Supplementary Fig. 6j). These revealed that the LS01 CDR1α contacts more peptide and MHC α2-helix than does CDR1α of LS10 or JM22, and that LS10 CDR3α interacts more with MHC α1/α2-helices and peptide than does CDR3α of the other TCRs. In contrast to these differences among TCRα chain interactions, the similar distribution of contact residue pairs for TCRβ chain interactions illustrates how similarly CDRβ loops of the three TCRs interact with MHC helices and peptide (Supplementary Fig. 6j).

Discussion

Previous studies have suggested that TCRs use a narrow sequence repertoire to recognize HLA-A2–M1 (refs. 8,9,13). In some other systems, constrained repertoires have been associated with poor protection or prognosis19,20,21,22. By deep sequencing TCRα and TCRβ genes from HLA-A2–M1-tetramer-sorted cells from influenza-immune donors, we obtained a more comprehensive and accurate picture of the diversity of TCRs responding to this antigen than in previous studies8,9,10,11,12. We found that the TCR repertoire that recognizes HLA-A2–M1 is substantially broader than previously appreciated, with most donors having several hundred different TCRα and TCRβ sequences used by CD8+ T cells in resting memory. These numbers are in line with estimates of the precursor frequency of naive T cells that recognize various antigenic peptides in other mouse and human models38,39,40. The overall diversity measures for this repertoire were at the upper end of the range of values previously reported for other viral-epitope-specific responses19,32,41,42. As previously reported, a single public TRBV gene segment (TRBV19) with a restricted CDR3β motif (xRS/Ax) (group I) dominated the HLA-A2–M1 response, representing in this cohort 50% (range: 15–72%) of the responding CD8+ T cells and 12–55% of the unique sequences. This is somewhat lower than previous estimates of 74–84% of the overall response9,11. Single-cell PCR and functional analysis of recombinant TCRs transfected into T cells showed that these TRBV19 xRS/Ax chains can pair with many different TCRα receptors, which greatly enhances their chances of selection. We identified a second dominant public TCR with the TRBV19 CDR3β motif, xGxY, which represented 12% (range: 2–26%) of the overall HLA-A2–M1 response (group II), and which was highly restricted to pairing with TRAV38-TRAJ52 TCRα chains with a 15-mer CDR3α motif. Another set of TCRs representing 1–4% (group III) used 11-mer CDR3β, but with a hydrophobic residue in place of Arg98 in the xRS/Ax motif. Overall, group I TCRs (range: 27–87%), group II TCRs (range: 2.6–40%), and group III TCRs (range: 1.2–4.8%) constitute most of the TRBV19-restricted response to this important antigen (Fig. 6d).

Together, groups I, II, and II constitute 65% of the overall CD8+ T cell response to HLA-A2–M1 and represent 45% of the sequence diversity. The remaining 35% of the TCR repertoire (group IV, 13–60%) includes TRBV19 and non-TRBV19 TCRs with many different TRAV genes and was highly private and diverse, without any obvious CDR3 motif. This suggests that there may be many other structural solutions for recognizing the relatively featureless HLA-A2–M1 complex. The combination of both conserved public and diverse private components in even a single antigen-specific TCR repertoire may be a basic principle for TCR repertoire structure.

The crystal structures reported here, when considered alongside previous work, reveal the structural basis for the recognition of HLA-A2–M1 by group I (exemplified by JM22 (refs. 13,14)), group II (LS10), and group III (LS01) TCRs, which together accounted for the majority of the overall HLA-A2–M1-specific response to this immunodominant antigen in our cohort. These structures show that there are many ways to recognize a featureless peptide. The different TCRs find different solutions to the specific binding of HLA-A2–M1, but all utilize a small niche between the peptide and the MHC α2-helix. It is tempting to speculate that other TCRs recognizing HLA-A2–M1 might also target these same pockets, and that conformational flexibility in this region might be a defining feature of this MHC–peptide complex that allows it to be specifically recognized despite the absence of overt structural features that differentiate it from HLA-A2 complexes carrying other peptides. The configuration of the M1 peptide in the LS10-bound complex is nearly identical to that of M1 bound to HLA-C*08 (Supplementary Fig. 5b43), and we suggest that HLA-C*08–M1-restricted TCRs, which were recently shown to be elicited by IAV infection in HLA-C*08+ donors43, might recognize their ligand by using the same cleft used by LS10.

The experimental equilibrium dissociation constants (Kd) measured for the binding of soluble TCR to immobilized HLA-A2–M1 were similar for LS10 and LS01 TCRs, but approximately five-fold lower for JM22 and the related LS06 (Fig. 2d). Examination of the corresponding crystal structures did not reveal any obvious reasons for these differences: BSA, the number of hydrophobic and total contacts, and the predicted interaction energy all were similar among the three complexes or greater for the weaker-binding LS10 and LS01. One difference is the greater number of interfacial hydrogen-bonding interactions for JM22—seven, as compared to one or two for the other TCRs. Although buried hydrogen bonds are not expected to contribute significantly to the overall binding energy, differential organization of bound solvent in the TCR-peptide-MHC interface has been implicated in the affinity determination for these complexes44, and interfacial hydrogen-bonding might influence this. We did not obtain detailed structural information for unbound LS10 or LS01 TCRs, but binding-induced conformational changes have been suggested to have a role in affinity determination through entropic effects for JM22 TCRs14.

If there are many ways to recognize HLA-A2–M1, why is the repertoire so biased toward TCRs that have TRBV19 with the 11-mer xR(S/A)x motif? It has been suggested that these public TCRs might represent clonotypes present at a high frequency in the naive precursor pool as a result of bias in the recombination machinery45 or convergent recombination of key contract sites46. Although convergent recombination could explain the high abundance of group I TCRs and the relatively low abundance of group II TCRs, it cannot explain the much lower abundance of group III TCRs, as the group I motif xR(S/A)x can be encoded by a similar frequency of random sequences (1.6%) as the group III motif x(F/Y/W)(S/A)x (1.3%). The structural analyses reported here show that TCR immunodominance patterns seem to scale with the number of specific interactions required, and this might provide an alternative explanation for the observed abundance patterns. Group I TCRs are present at the highest frequency, and require only TRBV19 and xRSx-containing 11-mer CDR3β, with wide latitude in TRAV gene usage and CDR3α sequences. Group II TCRs require both TRBV19 with 10-mer xGxY and also TRAV genes with a highly constrained 15-mer CDR3α motif. Group III TCRs are present at an even lower frequency, and require not only a hydrophobic bulky residue in place of Arg98 in xRSx but also a constellation of residues from CDR3β and TCRα. Figure 6e shows group frequencies plotted versus the number of TCR residues with side chains that make contact with peptide–MHC. It seems that TCRs able to find simpler ways to recognize HLA-A2–M1 by involving fewer specific amino acids evolve more easily and come to dominate the memory pool.

There have been few reports of multiple TCRs recognizing the same peptide–MHC complex27,28,29,47. The HLA-A2–M1–TCR structures reported here are unique in that they enabled us to infer the recognition mechanisms of a substantial portion of the natural viral-epitope-specific TCR repertoire in any HLA-A2+ individual. By contrast, previous studies showed structures of TCRs for an artificial ligand from genetically different mice47 or from different individuals27, or TCRs that reflect a limited repertoire for the viral epitopes at the individual or population level28,29. Deep sequencing of the TRBV gene repertoire has been examined for some viral epitopes25, and TRAV gene analyses have been examined in the CMV-specific responses of two donors26. However, there are no reports in which human viral-antigen-specific TCR repertoires have been characterized for both TRBV and TRAV sequences by NGS. The analyses reported here reveal a combination of highly diverse public and private repertoires that may be the prototype of a highly successful and resilient response likely to be present in all HLA-A2+ individuals. Increasing evidence suggests that an antigen-specific TCR-repertoire organization with focused diversity—that is, with dominant public clonotypes combined with an underlying highly diverse private component—may be more common for many antigens than previously thought. For instance, TCR repertoires for two featured epitopes—one from CMV pp65 in HLA-A2+ donors, and one from EBV EBNA 3A in HLA-B8+ donors—were once considered highly public oligoclonal responses, but now have been shown by NGS to also contain an underlying polyclonal repertoire25,26,27. On the basis of our observations of the repertoire that responds to HLA-A1–M1, we would expect that the dominant public clonotypes might represent T cell clones preferentially selected by convergent recombination and/or that use a small number of TCR residues to contact peptide–MHC. Similarly, diverse private clonotypes might represent T cell clones that have more stringent contact requirements, which could be fulfilled in many different ways with various TCRα and TCRβ sequences. Contrary to our expectations, the relatively featureless M1 peptide was recognized by many different TCRs that used different recognition strategies. It is likely that other peptides with more features also are recognized in different ways by different TCRs, and the relationship between TCR dominance patterns and peptide–MHC contacts observed for TCRs that respond to HLA-A2–M1 might hold for many other antigens.

A highly diverse repertoire, such as the one described here that recognizes HLA-A2–M1, should allow resilience against the loss of individual clonotypes with aging32 and against skewing of the response after infection with a cross-reactive pathogen48,49. The large number of HLA-A2–M1-specific clonotypes contributes to the overall memory T cell pool, thereby enhancing the opportunity for protective heterologous immunity that is now recognized as an important aspect of immune maturation50,51. A large pool of TCR clonotypes responding to HLA-A2–M1 could provide increased resistance to viral drift, although the xRS/A-containing JM22 TCR recently has been shown to be able to recognize M1 variants from circulating IAV strains52. Finally, it is possible that different TCRs activate antigen-specific cell functions differently, and thus lead to a more functionally heterogeneous and more complete pool of memory cells53. A better understanding of TCR repertoires is becoming increasingly important, as suggested by reports that the diversity index of the mucosal resident T cell repertoire predicts clinical prognosis in gastric cancer54. Ideally, vaccines would be able to induce dominant public as well as diverse private responses to provide a resilient repertoire of memory cells.

By combining structural and sequence information, we have now obtained the most comprehensive and highly detailed view of CD8+ T cell recognition for any known antigen. Thousands of different TCR sequences representing the bulk of the public HLA-A2–M1-restricted CD8+ T cell response can be understood in terms of the interactions identified in the TCR structural prototypes JM22, LS10, and LS01. The remaining idiosyncratic repertoire includes highly diverse TCR sequences that provide resiliency against clonal loss, diversion, and pathogen variation.

Methods

Study population.

Blood samples were collected from six HLA-A201 donors. Donors 185, 215, 240, and 264 were healthy donors between the ages of 18 and 20, and D085 and D105 were healthy middle-aged donors. All donors were volunteers from the University of Massachusetts (UMass) Student Health Services (Amherst, Massachusetts, USA) or UMass Medical Center (Worcester, Massachusetts, USA). HLA status was assessed by assay with an HLA-A2-specific mAb (BB7.2; BD Biosciences, San Jose, CA). Donors of this age are assumed to have been exposed to IAV, and all exhibited positive staining with HLA-A2–M1 tetramers, indicating that they had been previously exposed30. The Institutional Review Board committee from UMass Medical School in Worcester, Massachusetts, approved this study, and all donors who participated in this study gave informed consent. No statistical method was used to predetermine the sample size. The experiments were not randomized and were not performed with blinding.

Peptide synthesis.

The following HLA-A2 specific peptides were synthesized by 21st Century Biochemicals (Marlboro, MA) and purified to 90% purity: IAV M158–66 (GILGFVFTL); two EBV peptides, BMLF1280–288 (GLCTLVAML) and BRLF1109–117 (YVLDHLIVV); human tyrosinase peptide369–377 (YMDGTMSQV); and vaccinia virus MVA090 (KLTFLVEV).

Blood preparation and bulk CD8+ T cell culture.

PBMCs were isolated from fresh blood samples with Ficoll Paque Plus (Amersham Biosciences, Piscataway, NJ). CD8+ T cells were purified from PBMCs by positive selection with human CD8-specific MicroBeads (Miltenyi Biotech, USA). CD8+ T cells (2.5 × 105 per ml) were stimulated with peptide-pulsed (1 μM) irradiated TAP-deficient T2 cells (5 × 104 per ml) (CRL-1992; ATCC). T cell lines were fed every 3–4 d with AIM-V medium supplemented with 14% human serum, 16% MLA-144 culture supernatant, 10 U/ml recombinant IL-2, 1% L-glutamine, 0.0005% β-mercaptoethanol (Sigma-Aldrich), and 1% HEPES (HyClone)30. At the end of each week T cells were counted and re-stimulated with peptide-pulsed irradiated T2 cells for a total period of 3 weeks. In previous experiments that used this in vitro protocol to expand antigen-specific CD8+ T cell populations, TCR repertoires were comparable to those observed by tetramer sorting directly ex vivo from PBMCs55. Because the expansion step can introduce some skewing of the repertoire10, we evaluated the relative frequency of 18 TRBV sequence families directly ex vivo, by antibody-staining flow cytometry, and after in vitro expansion, by NGS. No significant skewing was detected (Supplementary Fig. 1a).

Staining and sorting of CD8+ T cells.

Positive staining with HLA-A2–M1 dextramer (Immundex, USA) was used as an indication that the donors had been exposed to influenza virus. Magnetic-bead-purified CD8+ T cells from donor 085 were stained with HLA-A2–M1 dextramer and anti-CD8/anti-CD3. M1+CD3+CD8+ cells were directly sorted into a 96-well plate (Bio-Rad) with an Aria II flow cytometer (BD). The plate was kept at −80 °C until further processing. The same staining procedure was used for bulk M1-specific CD8+ T cell isolation from cultured cytotoxic T lymphocytes, and sorted cells were sorted in tubes containing PBS buffer with 2% FBS. After a quick spin, cells were resuspended in lysis buffer (RNeasy kit, Qiagen) and stored at −80 °C for up to 1 month before RNA isolation.

TCR Vβ analysis ex vivo with mAb.

Sorted CD8+ T cells from fresh PBMCs directly ex vivo were incubated for 20 min with IAV-M1-specific tetramer, which was then washed off. An additional 20-min incubation was performed with 24 TCR Vβ antibodies that cover >70% of commonly used human Vβ (IO Test Beta Mark TCR Vβ Repertoire Kit, Beckman Coulter, Fullerton, CA). Samples were read on an LSRII (Beckman Coulter, Fullerton, CA).

RNA isolation, cDNA synthesis, and NGS of bulk M1-specific cells.

We isolated total RNA from lysates of cultured and sorted CD8+ T cells with the RNeasy mini kit (Qiagen), according to the manufacturer's recommendations. TCRα and TCRβ CDR3 regions were amplified and sequenced from cDNA reverse-transcribed from 100–300 ng of total RNA samples (SuperScript VILO cDNA synthesis kit, Invitrogen). Amplification and high-throughput sequencing of CDR3 regions were done on the ImmunoSEQ platform at Adaptive Biotechnologies as previously described56. The platform uses a panel of multiplexed TRV and TRJ primers selected to reduce the differential amplification bias of TCRα and TCRβ sequences56. The use of cDNA as a source of TCRα and TCRβ templates potentially can introduce bias as a result of differential mRNA expression, reverse transcription, or PCR amplification. These effects generally are expected to introduce errors in relative-abundance calculations of approximately two-fold or less57,58, although a recent analysis indicated that there is potential for much greater skewing with a different reverse-transcription protocol59. Such skewing would affect the total sequence numbers reported here, but not the numbers of unique clonotypes.

Multiplex nested single-cell RT-PCR.

Single cells sorted into 96-well plates were subjected to cDNA synthesis with the SuperScript VILO cDNA synthesis kit (Invitrogen) in a 2.5-μl reaction mixture containing 0.1% Triton X-100 (Sigma). Multiplex V-gene specific primer sets were used in two rounds of PCR to amplify CDR3α/β from single cells as previously described19. CDR3 amplicons were purified (ExoSAP-IT) and sequenced with primers that recognized constant regions of TRAC and TRBC19. Sanger DNA sequencing was performed by Genewiz (Cambridge, MA).

Identifying CDR3 sequences and NGS analysis.

The TCRα and TCRβ CDR3 sequences were identified according to the definition founded by the International ImMunoGeneTics collaboration60. NGS data were analyzed with ImmunoSEQ Analyzer 2.0, provided by Adaptive Biotechnologies (http://www.adaptivebiotech.com/immunoseq/analyzer). Single-cell CDR3 sequences were analyzed by IMGT/V-QUEST61. Only productively rearranged TCRα and TCRβ sequences without stop codons were used for repertoire analyses, including sequence-composition and gene-frequency analyses. V gene frequencies, CDR3 length and V/J gene pairing were analyzed with subprograms of the ImmunoSEQ Analyzer software and further processed by Microsoft Excel. Conserved motifs in CDR3 were assessed with the Weblogo software (http://weblogo.berkeley.edu/logo.cgi). Clonotype diversity for each donor was evaluated with the Shannon diversity index, H = −Σpiln(pi), and the Simpson diversity index, SI = 1 – Σpi2, where p is the clonotype fractional abundance.

Cloning of M1-specific full-length TCRα and TCRβ chains.

The multiplex nested single-cell PCR strategy described above identified paired CDR3α/β sequences and information on V gene usage, but to clone full-length TCRα and TCRβ we needed to isolate V gene sequences upstream of the CDR3, and C gene sequences downstream. We prepared individual full-length TCRα and TCRβ chain genes by ligating 5′ fragments that covered the signal sequence of the V gene to part of CDR3 and 3′ fragments corresponding to CDR3 and the termination codon of the TRC gene. As template DNA, we used cDNA from remaining PBMCs after CD8+ T cell separation for amplification of 5′ and 3′ fragments. Primers were designed in such a way that the 3′ end of the amplified 5′ fragment would overlap the 5′ end of the amplified 3′ fragment. More specifically, for 5′-fragment amplification, the forward primers included NdeI and EcoRI for TCRα and TCRβ, respectively, and sequences that could anneal to the signal sequence of each V gene and the reverse primer contained sequence 5′ of CDR3 that could anneal to the 3′ end of the germline V gene. For 3′-fragment amplification, the forward primers were designed to have sequence 3′ of CDR3 and the germline J gene for annealing, and the reverse primers included sequence intended to anneal to the 3′ end of the C gene and BamH I and BspE I for TCRα and TCRβ, respectively. The 5′ fragments and 3′ fragments were amplified with Phusion enzyme (NEB) and were gel-purified for subsequent overlapping PCR.

Construction of TCRα/β expression vector.

Full-length TCRα and TCRβ from overlapping PCR were cloned into a mammalian expression vector that contained eGFP protein (pEF1-IRES-eGFP, CLONTECH). The 'self-cleaving' 2A sequence of foot and mouth disease virus was inserted between TCRα and TCRβ to express two chains simultaneously62. EcoRI, BspEI, NdeI and BamHI enzyme sites were used to insert TCRβ, 2A sequence, and TCRα (EcoRI-TCRβ-BspEI-2A-NdeI-TCRα-BamHI) into the vector. To generate mutants of LS01 and LS10, we used the DpnI-mediated site-directed mutagenesis method. Full-length sequencing of inserts of the wild type and mutants was carried out to confirm the absence of unwanted mutation.

Transient expression of TCR in TCR-deficient J76-CD8 cells.

TCRα/β expression vectors were transferred via electroporation into TCR-deficient and CD8α-expressing Jurkat cells (J76-CD8α cells)63 kindly provided by Dr. Wolfgang Uckert (Max Delbruck Center). J76-CD8α cells were maintained in RPMI 1640 media containing 10% FBS in early log phase for transfection. Four million J76-CD8 cells (107 per ml) were harvested and mixed with 10–15 μg of expression vector plasmid in a 4-mm gap cuvette (Bio-Rad) for electroporation. Transfection was carried out with a BTX electroporator (260 V, 1,050 μF). Thirty-six hours after transfection, cells were harvested for MHC-tetramer staining.

Construction of TCR-expressing stable cell lines.

For T cell activation (CD69 upregulation) assays, stable cell lines individually expressing the 13 different TCRα/β pairs were constructed. Linearized TCR expression vectors were transferred into J76-CD8α cells through electroporation as described above. Two days after transfection, cells were transferred and selected in 400 μg/ml of G418-containing media for 3 weeks. For LS01, LS06, LS10 or JM22, G418-resistant cells were further diluted and transferred into 96-well plates at a density of 0.5 cells per well for single-cell cloning. G418-resistant single-cell clones were first selected against GFP expression to isolate expression-vector-containing clones, and were then further screened for TCR expression.

Preparation of peptide–HLA-A2 monomer/tetramer.

We prepared peptide–HLA-A2 monomers by folding urea-solubilized bacterially expressed inclusion bodies of HLA-A2 heavy chain and human β2-microglobin in the presence of 5 mg/L M1 peptide or control peptide MVA090 as described64. In some cases the modified HLA-A2 heavy chain with a free C-terminal cysteine at position 282 was used to add biotin via thiol chemistry for tetramer preparation and Biacore experiments. The folding mixture was filtered with a 0.2-μM filter unit (Corning) and buffer was exchanged with 10 mM Tris-Cl, pH 8.0, using a tangential flow concentrator. Folded HLA-A2–peptide complexes were isolated from the buffer-exchanged folding mixture by a series of chromatography steps consisting of Hitrap Q and Mono Q ion exchange and S-200 gel-filtration columns (GE Healthcare). For Biacore experiments and tetramer preparation, purified cysteine-containing HLA-A02–peptide monomers were reduced with 5 mM DTT and biotinylated using EZ-Link maleimide-PEG2-biotin (Thermo Scientific). Biotinylated M1 monomers were multimerized by R-phycoerythrin (PE)-labeled streptavidin by mixing at a final ratio of 5:1 (monomer:streptavidin).

Staining and flow cytometry of TCR-expressing J76-CD8α cells.

TCR-expressing J76-CD8 cells were washed twice with FACS buffer (PBS with 2% BSA, 0.02% azide), and 0.2–0.5 million cells were stained in 100 μl of staining solution containing LIVE/DEAD fixable violet dead cell stain, anti-TCR (IP26; BioLegend) and HLA-A2–M1 dextramer or control dextramer (HLA-A2–BRLF1 dextramer; Immudex) for 30 min at room temperature. For dose-dependent tetramer staining, increasing amounts of HLA-A2–peptide tetramers (300 nM, 120 nM, 48 nM, 19.2 nM, 7.68 nM, 3.07 nM) were added instead of dextramers. Stained cells were washed three times with FACS buffer for flow cytometry. Samples were analyzed with an LSRII flow cytometer (BD Biosciences) and FlowJo software (Tree Star).

CD69-upregulation assay.

To assess early TCR-driven signaling, we stimulated J76-CD8 cells stably expressing TCRs with peptide-pulsed HLA-A2-tranfected T2 cells65. T2-A2 cells were loaded with a single dose (1 μM) of M1 peptide or with varying concentrations of M1 peptide (10−12 M to 10−7 M) for 1 h at 37 °C. We removed unloaded excess M1 peptide by washing the cells twice with PBS. M1-loaded T2-A2 cells (0.1 × 106 cells) were incubated with TCR-expressing J76-CD8 cells (0.4 × 106 cells) at 37 °C in 24-well dishes for 12 h. After 12 h of incubation, cells were cooled on ice and harvested for FACS staining. We stained 0.2–0.5 million cells with anti-CD69 (FN50; BioLegend), anti-TCR (IP26; BioLegend), and LIVE/DEAD Fixable Violet dye for 30 min at 4 °C. Stained cells were washed with PBS and analyzed by FACS as described above.

Preparation of soluble TCRs.

For Biacore experiments and crystallization, we engineered extracellular portions of TCRα and TCRβ chains of LS01 and LS10 as stable soluble TCRs (sTCRs) by introducing an interchain disulfide as previously described66. Engineered TCRα and TCRβ chains were expressed as inclusion bodies. Urea-solubilized TCRα and TCRβ inclusion bodies were mixed and folded by dilution and dialysis as described67. Dialyzed folded sTCRs were purified by successive chromatography with Hitrap Q (5 ml), Mono Q (5 ml) and Superdex 26/600 (320 ml) columns. Final gel-filtration chromatography was used to exchange buffers for crystallization (10 mM Tris-Cl, pH 8.0, 50 mM NaCl, 1 mM EDTA, 0.02% NaN3) or Biacore experiments (10 mM HEPES, pH 7.4, 150 mM NaCl, 3 mM EDTA, 0.005% Tween 20).

Surface plasmon resonance analysis.

Neutravidin (3,000 RU) was immobilized in four flow cells of a CM5 chip at pH 5.5 via amine-coupling reaction in a Biacore 3000 instrument (BIAcore AB). Biotinylated HLA-A2–M1 was bound to immobilized neutravidin in flow cells 3 and 4 to achieve 1,200 RU. Neutravidin alone in flow cell 1 and HLA-A2–MVA090-bound neutravidin in flow cell 2 were used as negative controls. Solutions of LS01and LS10 sTCR (two-fold dilutions from 150 μM to 1.17 μM) were injected over the whole cell (flow cells 1–4) at 5 μL/min for 3 min to allow equilibrium binding, and then samples were returned to running buffer for dissociation. Running buffer was injected at 50 μl/min for 25 min to regenerate cells. We calculated the increased RU signal from specific binding of LS01 and LS10 to immobilized HLA-A2–M1 by subtracting the RU from the flow cell immobilized with irrelevant peptide–HLA-A2 complex and plotted that value against the sTCR concentration to calculate equilibrium dissociation constants (Kd), using Prism 6 software (GraphPad Software, Inc.).

Crystallization and data collection.

Purified sTCRs and HLA-A2–M1 (without free C-terminal cysteine) were mixed at final concentration of 10–15 mg/ml at a 1:1 molar ratio overnight to preform TCR–MHC–peptide ternary complexes. We set up all crystallization conditions via a sitting-drop vapor-diffusion technique in 96-well plates by mixing 0.5 μl of reservoir buffer and 0.5 μl of protein mixture at room temperature. The crystallization plates were stored at room temperature or 4 °C. One buffer condition (14% (w/v) PEG 4000, 100 mM Na-HEPES, pH 7.0, 200 mM ammonium sulfate) gave crystal plates of LS01–HLA-A2–M1 complex grown at 4 °C. Crystals of LS10–HLA-A2–M1 complex grew at 21 °C as plates in a buffer (10% (w/v) PEG 8000, 100 mM Tris-Cl, pH 7.0, 200 mM magnesium chloride) from the Wizard Classic crystallization screen (Rigaku). Crystals were briefly soaked in a 1:1 mixture of saturated sucrose and reservoir buffer for cryoprotection and were subsequently flash-frozen in liquid nitrogen and sent to the LRL-CAT beamline at the Advanced Photon Source (Argonne, IL, USA). Two data sets (2.06 Å and 2.48 Å for LS01 and LS10, respectively) were collected with 0.979-Å wavelength radiation and a MAR-165 charge-coupled device (CCD) detector. Data sets were indexed, integrated, and scaled with iMOSFLM and SCALA68,69,70. The LS01-complex crystal belonged to the P21 space group, and the LS10-complex crystal to the P1 space group (two molecules per asymmetric unit). Detailed data-collection statistics and unit cell parameters are shown in Table 1.

Structure determination and refinement.

Both TCR structures were determined by molecular replacement with Phaser71,72 and a sculpted JM22–HLA-A2–M1 structure (PDB 1OGA) as an initial search model with separate two search ensembles, where MHC/peptide was first located and followed by TCR. The LS01 complex contained one pMHC and one TCR in the asymmetric unit, and the LS10 complex contained two pMHCs and two TCRs in the asymmetric unit. After one round of rigid body refinement using separate structural domains (HLA-A2 α1 and α2, α3, β2m, peptide, Vα, Cα, Vβ and Cβ) of the output model from molecular replacement, models were built and refined by the AutoBuild component of PHENIX73. Composite omit maps of both models clearly showed densities for peptide, MHC, and TCR. Autobuilt models were further refined with several rounds of manual model building using the software COOT74 and automated refinement cycles using the phenix.refine program with diverse parameter adjustments (xyz coordinates, real space, rigid body, simulated annealing (torsion), individual restrained B factor), and NCS restraints for LS10. The final LS01–HLA-A2–M1 model had 96.9% of residues in favored regions of the Ramachandran plot, 3.1% in allowed regions, and no outliers, and it included MHC heavy chain residues 1–275, β2-microglobulin residues 2–99, complete M1 peptide residues 1–9, TCRα subunit residues 2–201, and TCRβ subunit residues 3–243. Side chains of the following residues were modeled, but with less confidence: 222–226 of HLA-A2; 58, 83, 127–131, and 142 of TCRα; and 43, 118, 180–185, and 219–222 of TCRβ. Side chains of residues 126 and 129 of TCRα and 118, 219, 220, and 222 of TCRβ were not resolved. The final LS10–HLA-A2–M1 model has two molecules in an asymmetric unit with 97.1% of residues in favored regions of the Ramachandran plot, 2.9% in allowed regions, and no outliers, and it includes two MHC heavy chain residues 1–276, two β2-microglobulin residues 0–99, two complete M1 peptide residues 1–9, two TCRα subunit residues 3–132 and 138–208, and two TCRβ subunit residues 4–242. Residues 133–137 of two TCRα chains were not built, and the following side chains were not resolved: two of residue 48 of β2-microglobulin; two each of residues 85, 115, and 117 of TCRβ; residue 268 of MHC heavy chain; and residue 137 of TCRα. The following loops were modeled, but with less confidence: 190–200, 218–228, 246–258, and 272–276 of MHC heavy chains; 54–62 and 154–158 of TCRα chains; and 179–183 and 215–230 of TCRβ chains. Also, the electron density of side chains of the following residues were very weak: residue 268 of MHC heavy chains; residues 19, 44, 58, 6, 74, and 75 of β2-microglobulins; residues 54, 121, 171, and 187 of TCRα chains; and residues 14, 16, 17, 72, 86, and 163 of TCRβ chains. Diffraction data and coordinates were deposited in the PDB with accession codes 5ISZ and 5JHD.

Structure analysis.

We used PyMOL (The PyMOL Molecular Graphics System, Version 1.8 Schrödinger, LLC) for graphical representation of TCR footprint and interacting residues. BSA between MHC/peptide and TCR and the predicted interfacial binding energy were analyzed by the PISA server (http://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver). Individual contributions of TCRα, TCRβ, α1 and α2 of MHC, and peptide to BSA were plotted in bar graphs. Residue-residue interactions were assessed with the contact map analysis server (http://ligin.weizmann.ac.il/cma/). We analyzed BSAs of HLA-A2-bound peptides with the PISA server, using the following PDB models: 1B0G, 1DUZ, 1EEY, 1EEZ, 1HHG, 1HHH, 1HHI, 1HHJ, 1HHK, 1I1F, 1I1Y, 1I4F, 1I7R, 1I7T, 1I7U, 1IM3, 1JF1, 1JHT, 1P7Q, 1QEW, 1QR1, 1S8D, 1S9W, 1S9X, 1S9Y, 1T1W, 1T1X, 1T1Y, 1T1Z, 1T20, 1T21, 1T22, 1TVB, 1TVH, 2AV1, 2AV7, 2C7U, 2CLR, 2GIT, 2GT9, 2GTW, 2GTZ, 2GUO, 2V2W, 2V2X, 2VLL, 2X4O, 2X4R, 2X4P, 2X4S, 2X4T, 2X4U, 2X70, 3BGM, 3BH8, 3BH9, 3BHB, 3FQR, 3FQT, 3FQU, 3FQW, 3FQX, 3FT2, 3FT3, 3FT4, 3GIV, 3GSO, 3GSQ, 3GSR, 3GSU, 3GSV, 3GSW, 3GSX, 3H7B, 3H9H, 3HPJ, 3I6G, 3I6K, 3IXA, 3KLA, 3MGO, 3MGT, 3MR9, 3MRB, 3MRC, 3MRD, 3MRF, 3MRG, 3MRH, 3MRI, 3MRJ, 3MRK, 3MRL, 3MRM, 3MRN, 3MRO, 3MRP, 3MRQ, 3MRR, 3MYJ, 3O3A, 3O3B, 3O3D, 3O3E, 3PWJ, 3PWL, 3PWN, 3QFD, 3REW, 3TO2, 3UTQ, 3V5D, 3V5H, 3V5K, 4E5X, 4GKN, 4GKS, 4I4W, 4JFO, 4JFP, 4JFQ, 4K7F, 4NNX, 4NNY, 4NO2, 4NO3, 4NO5, 4UQ3, and 4WJ5. The existence of interaction between Arg157MHC and TCRs in 35 TCR–HLA-A2–peptide complexes was examined in the following PDB models: 1AO7, 1BD2, 1LP9, 1OGA, 1QRN, 1QSE, 1QSF, 2BNQ, 2BNR, 2GJ6, 2J8U, 2JCC, 2UWE, 2VLJ, 2VLK, 2VLR, 3D3V, 3GSN, 3H9S, 3HG1, 3O4L, 3PWP, 3QDG, 3QDJ, 3QDM, 3QEQ, 3QFJ, 3UTS, 4EUP, 4FTV, 4L3E, 4QOK, 5D2L, and 5D2N.

Statistical analyses.

Nonlinear least-squares curve fitting and statistical analyses were done with Prism version 6 (Graphpad Software Inc.). Statistical tests included Student's t test, linear regression, Pearson correlation coefficient, repeated-measures two-way analysis of variance, and Wilcoxon matched-pairs signed-rank test.

Data availability.

Atomic coordinates and crystallographic structure factors for the LS01–HLA-A2–M1 and LS10–HLA-A2–M1 complexes have been deposited in the Protein Data Bank under accession codes 5ISZ and 5JHD. TCR sequence data are available from the ImmuneACCESS database (http://doi.org/10.21417/B7W88F). Source data for Figures 1, 3,5,6 are available with the paper online. Other data from this study are available from the corresponding authors upon reasonable request.