T cell receptor (TCR) sequences are very diverse, with many more possible sequence combinations than T cells in any one individual1,2,3,4. Here we define the minimal requirements for TCR antigen specificity, through an analysis of TCR sequences using a panel of peptide and major histocompatibility complex (pMHC)-tetramer-sorted cells and structural data. From this analysis we developed an algorithm that we term GLIPH (grouping of lymphocyte interactions by paratope hotspots) to cluster TCRs with a high probability of sharing specificity owing to both conserved motifs and global similarity of complementarity-determining region 3 (CDR3) sequences. We show that GLIPH can reliably group TCRs of common specificity from different donors, and that conserved CDR3 motifs help to define the TCR clusters that are often contact points with the antigenic peptides. As an independent validation, we analysed 5,711 TCRβ chain sequences from reactive CD4 T cells from 22 individuals with latent Mycobacterium tuberculosis infection. We found 141 TCR specificity groups, including 16 distinct groups containing TCRs from multiple individuals. These TCR groups typically shared HLA alleles, allowing prediction of the likely HLA restriction, and a large number of M. tuberculosis T cell epitopes enabled us to identify pMHC ligands for all five of the groups tested. Mutagenesis and de novo TCR design confirmed that the GLIPH-identified motifs were critical and sufficient for shared-antigen recognition. Thus the GLIPH algorithm can analyse large numbers of TCR sequences and define TCR specificity groups shared by TCRs and individuals, which should greatly accelerate the analysis of T cell responses and expedite the identification of specific ligands.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Arstila, T. P. et al. A direct estimate of the human αβ T cell receptor diversity. Science 286, 958–961 (1999)
Davis, M. M. & Bjorkman, P. J. T-cell antigen receptor genes and T-cell recognition. Nature 334, 395–402 (1988)
Qi, Q. et al. Diversity and clonal selection in the human T-cell repertoire. Proc. Natl Acad. Sci. USA 111, 13139–13144 (2014)
Shortman, K., Egerton, M., Spangrude, G. J. & Scollay, R. The generation and fate of thymocytes. Semin. Immunol. 2, 3–12 (1990)
Rudolph, M. G., Stanfield, R. L. & Wilson, I. A. How TCRs bind MHCs, peptides, and coreceptors. Annu. Rev. Immunol. 24, 419–466 (2006)
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)
Warren, R. L. et al. Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. Genome Res. 21, 790–797 (2011)
Rubelt, F. et al. Individual heritable differences result in unique cell lymphocyte receptor repertoires of naive and antigen-experienced cells. Nat. Commun. 7, 11112 (2016)
Li, W. & Godzik, A. CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006)
Song, I. et al. Broad TCR repertoire and diverse structural solutions for recognition of an immunodominant CD8+ T cell epitope. Nat. Struct. Mol. Biol. 24, 395–406 (2017)
Chattopadhyay, P. K., Yu, J. & Roederer, M. A live-cell assay to detect antigen-specific CD4+ T cells with diverse cytokine profiles. Nat. Med. 11, 1113–1117 (2005)
Frentsch, M. et al. Direct access to CD4+ T cells specific for defined antigens according to CD154 expression. Nat. Med. 11, 1118–1124 (2005)
Lindestam Arlehamn, C. S. et al. A quantitative analysis of complexity of human pathogen-specific CD4 T cell responses in healthy M. tuberculosis-infected South Africans. PLoS Pathog. 12, e1005760 (2016)
Han, A., Glanville, J., Hansmann, L. & Davis, M. M. Linking T-cell receptor sequence to functional phenotype at the single-cell level. Nat. Biotechnol. 32, 684–692 (2014)
Lindestam Arlehamn, C. S. et al. Memory T cells in latent Mycobacterium tuberculosis infection are directed against three antigenic islands and largely contained in a CXCR3+CCR6+ TH1 subset. PLoS Pathog. 9, e1003130 (2013)
Sallusto, F. Heterogeneity of human CD4+ T cells against microbes. Annu. Rev. Immunol. 34, 317–334 (2016)
Sharma, G. & Holt, R. A. T-cell epitope discovery technologies. Hum. Immunol. 75, 514–519 (2014)
Wang, P. et al. Peptide binding predictions for HLA DR, DP and DQ molecules. BMC Bioinformatics 11, 568 (2010)
Mahomed, H. et al. Predictive factors for latent tuberculosis infection among adolescents in a high-burden area in South Africa. Int. J. Tuberc. Lung Dis. 15, 331–336 (2011)
Han, A. et al. Dietary gluten triggers concomitant activation of CD4+ and CD8+ αβ T cells and γδ T cells in celiac disease. Proc. Natl Acad. Sci. USA 110, 13073–13078 (2013)
Glanville, J. et al. Naive antibody gene-segment frequencies are heritable and unaltered by chronic lymphocyte ablation. Proc. Natl Acad. Sci. USA 108, 20066–20071 (2011)
Glanville, J. et al. Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc. Natl Acad. Sci. USA 106, 20216–20221 (2009)
Hathaway, N. A. et al. Dynamics and memory of heterochromatin in living cells. Cell 149, 1447–1460 (2012)
We thank the Stanford Human Immune Monitoring Center for high-throughput sequencing support, and M. Mindrinos and co-workers at Sirona Genomics for the HLA typing. We especially thank The Bill and Melinda Gates Foundation, The National Institutes of Health (2U19 AI057229) and the Howard Hughes Medical Institute for financial support, S.-A. Xue for providing the Jurkat 76 T cell line, C.-Y. Chang and R. Taniguchi for helping with lentiviral transduction, R. Hovde for valuable discussions regarding statistical measures, H. Mahomed, W. Hanekom and members of the Adolescent Cohort Study (ACS) group for enrolment and follow-up of the M. tuberculosis-infected adolescents, and R. Bedi for collecting TCR sequences from the literature. Sorting was (partially) performed in the Shared FACS Facility obtained using NIH S10 Shared instrument grant (S10RR025518-01).
The authors declare no competing financial interests.
Reviewer Information Nature thanks B. Chain, R. Holt and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
Extended Data Figure 1 TCRs specific to common antigens show motifs within a limited region of CDR residues with high structural contact propensity.
a, Probability of IMGT TCR CDR positions being within 5 Å of peptide antigen, as tabulated from 52 published crystal structures of TCR–pMHC interactions (Supplementary Table 2), and displayed as a heat map on representative TCR 2j8u. Positions with less than 25% contact probability are shown in black. b, Alignment of 52 non-redundant (<95% amino acid identity between any pair) TCR sequences from TCR–pMHC PDB structure complexes. Positions within 5 Å of peptide antigen are indicated in dark blue. Linear set of 3–5 amino acids in CDR3β observed in almost every structure, which TCRβ–CDR3 IMGT positions 108–111 being in contact in 90% of TCR structures. Minimal contacts observed by CDR1 and CDR2 of either chain. TCRs are clustered into five general contact modes according to contact profiles of all six CDRs.
a, Class II single-cell paired α/β sequencing with crystal structure representative indicating variable CDR3β length and discontinuous role of CDR3α. Discontinuous negatively charged residues in structure 1J8H coordinate lysine-positive charges in peptide; negatively charged residues indicated in orange in alignment when found. b, Positional amino acid bias in flu HLA-A2 dominant motif CDR3β and CDR3α convergence group, normalized by amino acid diversity in the unselected repertoire. Enrichment of RS(S/A) motif in TCRβ compared with naive distribution. Enrichment of SQ at IMGT positions 112, 113 in TCRα, with enrichment of glycine at multiple positions.
GLIPH searches for global and local (motif) CDR3 similarity in TCR CDR regions with high contact probability. Motif significance and global similarity cutoffs are established by repeat random sampling against an unbiased reference pool of TCRs. Second, all identified global and local relationships between TCRs are used to construct clusters of TCR specificity groups. Third, each specificity group is analysed for enrichment of common V-genes, CDR3 lengths, clonal expansions, shared HLA alleles in recipients, motif significance, and cluster size. Enrichment probability is obtained by calculating the probability of obtaining at least the observed Simpson diversity index measure for that feature compared with a random sampling of equal size from the source data set. The resulting features are combined into a specificity group score for each group.
Extended Data Figure 4 Benchmark of GLIPH subcomponents and complete algorithm on random naive TCRs or a mixed training set pool of pMHC tetramer+ TCRs of 8 known specificities.
a, GLIPH clusters up to 14.5% of tetramer+ TCRs, while clustering less than 0.5% of naive TCRs, a combination of global CDR3 similarity and local motif enrichment resulting in more clustering than either individually. b, The cluster results of applying GLIPH to the mixed pool of tetramer-sorted TCRs. Each node is a TCR, their specificity indicated by colour. Edges between TCRs indicating a GLIPH-predicted shared specificity; light grey indicate shared local motif, and dark grey indicate shared global similarity. Over 95% of cluster members are grouped with other TCRs of the same specificity. c, GLIPH components evaluated for percentage of TCRs clustered versus percentage of correct specificity assignments. Global CDR3 clustering by hamming dist = 1 or dist = 2 are reported. Global CDR3 similarity clustering by CD-HIT, with clustering cutoffs 0.8 or 0.9 reported. Local motif similarity clustering with and without structural constraints reported. Complete GLIPH, including global CDR3 identity, local CDR3 motif similarity, structural constraints and clustering scoring, resulted in 14.5% of TCRs clustering with 95% of cluster members correctly grouped with other TCRs of shared specificity. For global similarity, distance 1 resulted in effective grouping of TCRs whereas distance 2 resulted in predominantly mixed clusters. For local motifs, effective TCR clustering could only be obtained when structural contact probability masks were applied. Similarly, although CD-HIT was not effective at clustering TCRs by common specificity when provided the entire TCR sequences, when offered only the high contact probability CDR3s, it was able to perform effective clusters provided an appropriate clustering threshold. d, When run on replicate A containing TCRs from half of study subjects, GLIPH produced specificity groups whose positional weight matrices (PWMs) could then be used to score the TCRs from replicate B subjects (equations (5) and (6) in Methods). GLIPH scoring identifies new TCRs of correct specificity from new subjects.
a, Gating strategy used for isolating and sorting tetramer-positive T cells. b, Frozen PBMCs from QFN+ donors are thawed, recovered and stimulated with either M. tuberculosis lysate or peptide pool. Antigen-specific T cells are single-cell-sorted into 96-well plate for TCR amplification using established protocol14. c, Gating strategy used for isolating and single-cell sorting antigen-specific T cells.
Extended Data Figure 6 Phenotypic analysis of clonal expanded M. tuberculosis-specific CD4+ T cells.
a, Gating strategy for isolating antigen-specific T cells. PBMC from one QFN+ donor (02/0259) was stimulated with M. tuberculosis lysate and then stained with activation markers CD69 and CD154. Antigen-specific CD4+ T cells were sorted by gating on CD69+CD154+ population. Alternatively, PBMCs were stimulated with megapool peptide library. Antigen-specific CD4+ T cells were isolated using cytokine capture assay, IL-2 or IFNγ. b, 18-parameter (parameters listed on right side) phenotypic analysis of M. tuberculosis-specific CD4+ T cells from all the 22 donors. Individual T cells are grouped by TCR sequence; each colour on the bar above the heat maps represents a distinct and clonal expanded TCR sequence. The majority of cells presented a TH1*-like phenotype including IFNγ and IL-2 production, T-bet and RORC expression, as is characteristic of previously reported M. tuberculosis responses.
Clonal analysis of M. tuberculosis-specific CD4+ T cells from all the 22 donors using different selection strategy, including stimulation by ESAT6/CFP-10 pool (C/E Pool) or Megapool followed by cytokine capture assay and M. tuberculosis lysate stimulation followed by CD154+ selection. Each dot represents a distinct TCR sequence and the count represents the number of repeat. PMA/ionomycin stimulation was used as a non-specific stimulation control.
a, Each individual peptide from megapool was tested against J76-NFATRE-luc cell expressing TCR025 in co-culture with K562 expressing DRB1*1503. Column 1–300: individual peptide from Megapool, column 301: CD3/CD28 stimulation as positive control. Peptides predicted to be in the top 15 percentile of binding to each HLA by the MHC-II Consensus method are indicated by grey bars. Mean ± s.d. (n = 3, biological replicates) are shown. The insert table shows the restricted HLA type and responding peptides. b–d, A similar screen was also performed for TCR054 (b), TCR098 (c) and TCR088 (d).
Amino acid alignment presents first the TCRβ chain followed the TCRα chain for naturally occurring group II natural TCRs n1–n10 from Fig. 5b (n denotes natural) and de novo TCRs De9–De18 from Fig. 5e. All segment identities are reported for each sequence in the sequence headers. Positional conservation is coloured as dark blue if conserved, and light blue or white if variable.
Extended Data Figure 10 Comparison of CDR3 length and 3mer motif composition of naive TCR reference set.
The naive control data set consists of 162,165 non-redundant V-J-CDR3 sequences from CD45RA+RO− naive T cells (labelled with the author name ‘Warren’)7, 83,910 non-redundant V-J-CDR3 sequences from CD4 naive T cells from 10 healthy controls, and 27,292 non-redundant V-J-CDR3 sequences from CD8 naive T cells from 10 healthy controls8, for a total of 268,955 unique naive V-J-CDR3 sequences. a, b, Analysis of CDR3 length distributions (a) and motif frequency distributions (b) indicates that the three naive reference sets have very similar CDR3 length distributions and 3mer amino acid motif frequency distributions (r = 0.99, r = 0.95, and r = 0.94 Pearson correlation coefficients for CD4 × CD8, CD4 × Warren, and CD8 × Warren, respectively).
This file contains a Supplementary Discussion providing an overview and application summary of GLIPH and Supplementary Methods including GLIPH documentation and training set curation details.
This file contains a table of all tetramer sorted and literature derived TCRs in test dataset.
This file contains a table of structural analysis of all crystalized TCRs and their contacts.
This file contains a table of all Mtb single cell sequenced TCR clones.
This file contains a table of Tet+ test set GLIPH specificity groups.
This file contains a table of Mtb GLIPH GLIPH specificity groups.
This file contains a table of HLA associations of all Mtb donors.
This file contains a table of donor replicate assignments and new predicted specificity group members.
About this article
Cite this article
Glanville, J., Huang, H., Nau, A. et al. Identifying specificity groups in the T cell receptor repertoire. Nature 547, 94–98 (2017). https://doi.org/10.1038/nature22976
This article is cited by
The screening, identification, design and clinical application of tumor-specific neoantigens for TCR-T cells
Molecular Cancer (2023)
Nature Immunology (2023)
Communications Biology (2023)
Signal Transduction and Targeted Therapy (2023)
Nature Biotechnology (2023)