Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Identifying specificity groups in the T cell receptor repertoire


T cell receptor (TCR) sequences are very diverse, with many more possible sequence combinations than T cells in any one individual1,2,3,4. Here we define the minimal requirements for TCR antigen specificity, through an analysis of TCR sequences using a panel of peptide and major histocompatibility complex (pMHC)-tetramer-sorted cells and structural data. From this analysis we developed an algorithm that we term GLIPH (grouping of lymphocyte interactions by paratope hotspots) to cluster TCRs with a high probability of sharing specificity owing to both conserved motifs and global similarity of complementarity-determining region 3 (CDR3) sequences. We show that GLIPH can reliably group TCRs of common specificity from different donors, and that conserved CDR3 motifs help to define the TCR clusters that are often contact points with the antigenic peptides. As an independent validation, we analysed 5,711 TCRβ chain sequences from reactive CD4 T cells from 22 individuals with latent Mycobacterium tuberculosis infection. We found 141 TCR specificity groups, including 16 distinct groups containing TCRs from multiple individuals. These TCR groups typically shared HLA alleles, allowing prediction of the likely HLA restriction, and a large number of M. tuberculosis T cell epitopes enabled us to identify pMHC ligands for all five of the groups tested. Mutagenesis and de novo TCR design confirmed that the GLIPH-identified motifs were critical and sufficient for shared-antigen recognition. Thus the GLIPH algorithm can analyse large numbers of TCR sequences and define TCR specificity groups shared by TCRs and individuals, which should greatly accelerate the analysis of T cell responses and expedite the identification of specific ligands.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1: Characteristics of TCRs reactive to common antigens across individuals.
Figure 2: Crystal structure representatives of TCR specificity groups reveal the structural basis for antigen-specific paratope convergence.
Figure 3: TCR specificity groups and predicted HLA-restriction among M. tuberculosis-infected subjects.
Figure 4: Identification of common antigen recognition by TCR specificity groups.
Figure 5: Mutagenesis validation and de novo TCR design.

Similar content being viewed by others


  1. Arstila, T. P. et al. A direct estimate of the human αβ T cell receptor diversity. Science 286, 958–961 (1999)

    Article  CAS  PubMed  Google Scholar 

  2. Davis, M. M. & Bjorkman, P. J. T-cell antigen receptor genes and T-cell recognition. Nature 334, 395–402 (1988)

    ADS  CAS  PubMed  Google Scholar 

  3. Qi, Q. et al. Diversity and clonal selection in the human T-cell repertoire. Proc. Natl Acad. Sci. USA 111, 13139–13144 (2014)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. Shortman, K., Egerton, M., Spangrude, G. J. & Scollay, R. The generation and fate of thymocytes. Semin. Immunol. 2, 3–12 (1990)

    CAS  PubMed  Google Scholar 

  5. Rudolph, M. G., Stanfield, R. L. & Wilson, I. A. How TCRs bind MHCs, peptides, and coreceptors. Annu. Rev. Immunol. 24, 419–466 (2006)

    Article  CAS  PubMed  Google Scholar 

  6. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Warren, R. L. et al. Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. Genome Res. 21, 790–797 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Rubelt, F. et al. Individual heritable differences result in unique cell lymphocyte receptor repertoires of naive and antigen-experienced cells. Nat. Commun. 7, 11112 (2016)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. Li, W. & Godzik, A. CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006)

    Article  CAS  PubMed  Google Scholar 

  10. Song, I. et al. Broad TCR repertoire and diverse structural solutions for recognition of an immunodominant CD8+ T cell epitope. Nat. Struct. Mol. Biol. 24, 395–406 (2017)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  11. Chattopadhyay, P. K., Yu, J. & Roederer, M. A live-cell assay to detect antigen-specific CD4+ T cells with diverse cytokine profiles. Nat. Med. 11, 1113–1117 (2005)

    Article  CAS  PubMed  Google Scholar 

  12. Frentsch, M. et al. Direct access to CD4+ T cells specific for defined antigens according to CD154 expression. Nat. Med. 11, 1118–1124 (2005)

    Article  CAS  PubMed  Google Scholar 

  13. Lindestam Arlehamn, C. S. et al. A quantitative analysis of complexity of human pathogen-specific CD4 T cell responses in healthy M. tuberculosis-infected South Africans. PLoS Pathog. 12, e1005760 (2016)

    Article  PubMed  PubMed Central  Google Scholar 

  14. Han, A., Glanville, J., Hansmann, L. & Davis, M. M. Linking T-cell receptor sequence to functional phenotype at the single-cell level. Nat. Biotechnol. 32, 684–692 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Lindestam Arlehamn, C. S. et al. Memory T cells in latent Mycobacterium tuberculosis infection are directed against three antigenic islands and largely contained in a CXCR3+CCR6+ TH1 subset. PLoS Pathog. 9, e1003130 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  16. Sallusto, F. Heterogeneity of human CD4+ T cells against microbes. Annu. Rev. Immunol. 34, 317–334 (2016)

    Article  CAS  PubMed  Google Scholar 

  17. Sharma, G. & Holt, R. A. T-cell epitope discovery technologies. Hum. Immunol. 75, 514–519 (2014)

    Article  CAS  PubMed  Google Scholar 

  18. Wang, P. et al. Peptide binding predictions for HLA DR, DP and DQ molecules. BMC Bioinformatics 11, 568 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  19. Mahomed, H. et al. Predictive factors for latent tuberculosis infection among adolescents in a high-burden area in South Africa. Int. J. Tuberc. Lung Dis. 15, 331–336 (2011)

    CAS  PubMed  Google Scholar 

  20. Han, A. et al. Dietary gluten triggers concomitant activation of CD4+ and CD8+ αβ T cells and γδ T cells in celiac disease. Proc. Natl Acad. Sci. USA 110, 13073–13078 (2013)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  21. Glanville, J. et al. Naive antibody gene-segment frequencies are heritable and unaltered by chronic lymphocyte ablation. Proc. Natl Acad. Sci. USA 108, 20066–20071 (2011)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  22. Glanville, J. et al. Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc. Natl Acad. Sci. USA 106, 20216–20221 (2009)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Hathaway, N. A. et al. Dynamics and memory of heterochromatin in living cells. Cell 149, 1447–1460 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank the Stanford Human Immune Monitoring Center for high-throughput sequencing support, and M. Mindrinos and co-workers at Sirona Genomics for the HLA typing. We especially thank The Bill and Melinda Gates Foundation, The National Institutes of Health (2U19 AI057229) and the Howard Hughes Medical Institute for financial support, S.-A. Xue for providing the Jurkat 76 T cell line, C.-Y. Chang and R. Taniguchi for helping with lentiviral transduction, R. Hovde for valuable discussions regarding statistical measures, H. Mahomed, W. Hanekom and members of the Adolescent Cohort Study (ACS) group for enrolment and follow-up of the M. tuberculosis-infected adolescents, and R. Bedi for collecting TCR sequences from the literature. Sorting was (partially) performed in the Shared FACS Facility obtained using NIH S10 Shared instrument grant (S10RR025518-01).

Author information

Authors and Affiliations



J.G., H.H. and M.M.D. conceptualized the study; H.H. and J.G. performed the experiments with assistance from A.N. and O.H.; J.G., H.H. and M.M.D. performed analysis with assistance from C.P, S.M.K., S.D.B., and O.M.M.; J.G. authored the codebase with assistance from N.H.; X.H.J. and A.H. provided help with TCR sequencing; C.L.A. and A.S. provided the megapool and data interpretation; T.J.S. provided samples from the M. tuberculosis study; F.R. and L.E.W. provided sequencing data; J.G., H.H. and M.M.D. wrote the manuscript with input from all authors; M.M.D. supervised the study.

Corresponding author

Correspondence to Mark M. Davis.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks B. Chain, R. Holt and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 TCRs specific to common antigens show motifs within a limited region of CDR residues with high structural contact propensity.

a, Probability of IMGT TCR CDR positions being within 5 Å of peptide antigen, as tabulated from 52 published crystal structures of TCR–pMHC interactions (Supplementary Table 2), and displayed as a heat map on representative TCR 2j8u. Positions with less than 25% contact probability are shown in black. b, Alignment of 52 non-redundant (<95% amino acid identity between any pair) TCR sequences from TCR–pMHC PDB structure complexes. Positions within 5 Å of peptide antigen are indicated in dark blue. Linear set of 3–5 amino acids in CDR3β observed in almost every structure, which TCRβ–CDR3 IMGT positions 108–111 being in contact in 90% of TCR structures. Minimal contacts observed by CDR1 and CDR2 of either chain. TCRs are clustered into five general contact modes according to contact profiles of all six CDRs.

Extended Data Figure 2 Crystal structure representative of TCR specificity groups.

a, Class II single-cell paired α/β sequencing with crystal structure representative indicating variable CDR3β length and discontinuous role of CDR3α. Discontinuous negatively charged residues in structure 1J8H coordinate lysine-positive charges in peptide; negatively charged residues indicated in orange in alignment when found. b, Positional amino acid bias in flu HLA-A2 dominant motif CDR3β and CDR3α convergence group, normalized by amino acid diversity in the unselected repertoire. Enrichment of RS(S/A) motif in TCRβ compared with naive distribution. Enrichment of SQ at IMGT positions 112, 113 in TCRα, with enrichment of glycine at multiple positions.

Extended Data Figure 3 Three-step GLIPH algorithm.

GLIPH searches for global and local (motif) CDR3 similarity in TCR CDR regions with high contact probability. Motif significance and global similarity cutoffs are established by repeat random sampling against an unbiased reference pool of TCRs. Second, all identified global and local relationships between TCRs are used to construct clusters of TCR specificity groups. Third, each specificity group is analysed for enrichment of common V-genes, CDR3 lengths, clonal expansions, shared HLA alleles in recipients, motif significance, and cluster size. Enrichment probability is obtained by calculating the probability of obtaining at least the observed Simpson diversity index measure for that feature compared with a random sampling of equal size from the source data set. The resulting features are combined into a specificity group score for each group.

Extended Data Figure 4 Benchmark of GLIPH subcomponents and complete algorithm on random naive TCRs or a mixed training set pool of pMHC tetramer+ TCRs of 8 known specificities.

a, GLIPH clusters up to 14.5% of tetramer+ TCRs, while clustering less than 0.5% of naive TCRs, a combination of global CDR3 similarity and local motif enrichment resulting in more clustering than either individually. b, The cluster results of applying GLIPH to the mixed pool of tetramer-sorted TCRs. Each node is a TCR, their specificity indicated by colour. Edges between TCRs indicating a GLIPH-predicted shared specificity; light grey indicate shared local motif, and dark grey indicate shared global similarity. Over 95% of cluster members are grouped with other TCRs of the same specificity. c, GLIPH components evaluated for percentage of TCRs clustered versus percentage of correct specificity assignments. Global CDR3 clustering by hamming dist = 1 or dist = 2 are reported. Global CDR3 similarity clustering by CD-HIT, with clustering cutoffs 0.8 or 0.9 reported. Local motif similarity clustering with and without structural constraints reported. Complete GLIPH, including global CDR3 identity, local CDR3 motif similarity, structural constraints and clustering scoring, resulted in 14.5% of TCRs clustering with 95% of cluster members correctly grouped with other TCRs of shared specificity. For global similarity, distance 1 resulted in effective grouping of TCRs whereas distance 2 resulted in predominantly mixed clusters. For local motifs, effective TCR clustering could only be obtained when structural contact probability masks were applied. Similarly, although CD-HIT was not effective at clustering TCRs by common specificity when provided the entire TCR sequences, when offered only the high contact probability CDR3s, it was able to perform effective clusters provided an appropriate clustering threshold. d, When run on replicate A containing TCRs from half of study subjects, GLIPH produced specificity groups whose positional weight matrices (PWMs) could then be used to score the TCRs from replicate B subjects (equations (5) and (6) in Methods). GLIPH scoring identifies new TCRs of correct specificity from new subjects.

Extended Data Figure 5 Platform for PBMC stimulation and characterization of antigen-specific TCRs.

a, Gating strategy used for isolating and sorting tetramer-positive T cells. b, Frozen PBMCs from QFN+ donors are thawed, recovered and stimulated with either M. tuberculosis lysate or peptide pool. Antigen-specific T cells are single-cell-sorted into 96-well plate for TCR amplification using established protocol14. c, Gating strategy used for isolating and single-cell sorting antigen-specific T cells.

Extended Data Figure 6 Phenotypic analysis of clonal expanded M. tuberculosis-specific CD4+ T cells.

a, Gating strategy for isolating antigen-specific T cells. PBMC from one QFN+ donor (02/0259) was stimulated with M. tuberculosis lysate and then stained with activation markers CD69 and CD154. Antigen-specific CD4+ T cells were sorted by gating on CD69+CD154+ population. Alternatively, PBMCs were stimulated with megapool peptide library. Antigen-specific CD4+ T cells were isolated using cytokine capture assay, IL-2 or IFNγ. b, 18-parameter (parameters listed on right side) phenotypic analysis of M. tuberculosis-specific CD4+ T cells from all the 22 donors. Individual T cells are grouped by TCR sequence; each colour on the bar above the heat maps represents a distinct and clonal expanded TCR sequence. The majority of cells presented a TH1*-like phenotype including IFNγ and IL-2 production, T-bet and RORC expression, as is characteristic of previously reported M. tuberculosis responses.

Extended Data Figure 7 Clonal expansion of M. tuberculosis-specific CD4+ T cells.

Clonal analysis of M. tuberculosis-specific CD4+ T cells from all the 22 donors using different selection strategy, including stimulation by ESAT6/CFP-10 pool (C/E Pool) or Megapool followed by cytokine capture assay and M. tuberculosis lysate stimulation followed by CD154+ selection. Each dot represents a distinct TCR sequence and the count represents the number of repeat. PMA/ionomycin stimulation was used as a non-specific stimulation control.

Extended Data Figure 8 Epitope screen using luciferase assay.

a, Each individual peptide from megapool was tested against J76-NFATRE-luc cell expressing TCR025 in co-culture with K562 expressing DRB1*1503. Column 1–300: individual peptide from Megapool, column 301: CD3/CD28 stimulation as positive control. Peptides predicted to be in the top 15 percentile of binding to each HLA by the MHC-II Consensus method are indicated by grey bars. Mean ± s.d. (n = 3, biological replicates) are shown. The insert table shows the restricted HLA type and responding peptides. bd, A similar screen was also performed for TCR054 (b), TCR098 (c) and TCR088 (d).

Extended Data Figure 9 Amino acid alignment of naturally occurring and de novo group II TCRs.

Amino acid alignment presents first the TCRβ chain followed the TCRα chain for naturally occurring group II natural TCRs n1–n10 from Fig. 5b (n denotes natural) and de novo TCRs De9–De18 from Fig. 5e. All segment identities are reported for each sequence in the sequence headers. Positional conservation is coloured as dark blue if conserved, and light blue or white if variable.

Extended Data Figure 10 Comparison of CDR3 length and 3mer motif composition of naive TCR reference set.

The naive control data set consists of 162,165 non-redundant V-J-CDR3 sequences from CD45RA+RO naive T cells (labelled with the author name ‘Warren’)7, 83,910 non-redundant V-J-CDR3 sequences from CD4 naive T cells from 10 healthy controls, and 27,292 non-redundant V-J-CDR3 sequences from CD8 naive T cells from 10 healthy controls8, for a total of 268,955 unique naive V-J-CDR3 sequences. a, b, Analysis of CDR3 length distributions (a) and motif frequency distributions (b) indicates that the three naive reference sets have very similar CDR3 length distributions and 3mer amino acid motif frequency distributions (r = 0.99, r = 0.95, and r = 0.94 Pearson correlation coefficients for CD4 × CD8, CD4 × Warren, and CD8 × Warren, respectively).

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion providing an overview and application summary of GLIPH and Supplementary Methods including GLIPH documentation and training set curation details.

Supplementary Table 1

This file contains a table of all tetramer sorted and literature derived TCRs in test dataset.

Supplementary Table 2

This file contains a table of structural analysis of all crystalized TCRs and their contacts.

Supplementary Table 3

This file contains a table of all Mtb single cell sequenced TCR clones.

Supplementary Table 4

This file contains a table of Tet+ test set GLIPH specificity groups.

Supplementary Table 5

This file contains a table of Mtb GLIPH GLIPH specificity groups.

Supplementary Table 6

This file contains a table of HLA associations of all Mtb donors.

Supplementary Table 7

This file contains a table of donor replicate assignments and new predicted specificity group members.

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Glanville, J., Huang, H., Nau, A. et al. Identifying specificity groups in the T cell receptor repertoire. Nature 547, 94–98 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing