T cell receptor (TCR) sequences are very diverse, with many more possible sequence combinations than T cells in any one individual1,2,3,4. Here we define the minimal requirements for TCR antigen specificity, through an analysis of TCR sequences using a panel of peptide and major histocompatibility complex (pMHC)-tetramer-sorted cells and structural data. From this analysis we developed an algorithm that we term GLIPH (grouping of lymphocyte interactions by paratope hotspots) to cluster TCRs with a high probability of sharing specificity owing to both conserved motifs and global similarity of complementarity-determining region 3 (CDR3) sequences. We show that GLIPH can reliably group TCRs of common specificity from different donors, and that conserved CDR3 motifs help to define the TCR clusters that are often contact points with the antigenic peptides. As an independent validation, we analysed 5,711 TCRβ chain sequences from reactive CD4 T cells from 22 individuals with latent Mycobacterium tuberculosis infection. We found 141 TCR specificity groups, including 16 distinct groups containing TCRs from multiple individuals. These TCR groups typically shared HLA alleles, allowing prediction of the likely HLA restriction, and a large number of M. tuberculosis T cell epitopes enabled us to identify pMHC ligands for all five of the groups tested. Mutagenesis and de novo TCR design confirmed that the GLIPH-identified motifs were critical and sufficient for shared-antigen recognition. Thus the GLIPH algorithm can analyse large numbers of TCR sequences and define TCR specificity groups shared by TCRs and individuals, which should greatly accelerate the analysis of T cell responses and expedite the identification of specific ligands.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    et al. A direct estimate of the human αβ T cell receptor diversity. Science 286, 958–961 (1999)

  2. 2.

    & T-cell antigen receptor genes and T-cell recognition. Nature 334, 395–402 (1988)

  3. 3.

    et al. Diversity and clonal selection in the human T-cell repertoire. Proc. Natl Acad. Sci. USA 111, 13139–13144 (2014)

  4. 4.

    , , & The generation and fate of thymocytes. Semin. Immunol. 2, 3–12 (1990)

  5. 5.

    , & How TCRs bind MHCs, peptides, and coreceptors. Annu. Rev. Immunol. 24, 419–466 (2006)

  6. 6.

    et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)

  7. 7.

    et al. Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. Genome Res. 21, 790–797 (2011)

  8. 8.

    et al. Individual heritable differences result in unique cell lymphocyte receptor repertoires of naive and antigen-experienced cells. Nat. Commun. 7, 11112 (2016)

  9. 9.

    & CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006)

  10. 10.

    et al. Broad TCR repertoire and diverse structural solutions for recognition of an immunodominant CD8+ T cell epitope. Nat. Struct. Mol. Biol. 24, 395–406 (2017)

  11. 11.

    , & A live-cell assay to detect antigen-specific CD4+ T cells with diverse cytokine profiles. Nat. Med. 11, 1113–1117 (2005)

  12. 12.

    et al. Direct access to CD4+ T cells specific for defined antigens according to CD154 expression. Nat. Med. 11, 1118–1124 (2005)

  13. 13.

    et al. A quantitative analysis of complexity of human pathogen-specific CD4 T cell responses in healthy M. tuberculosis-infected South Africans. PLoS Pathog. 12, e1005760 (2016)

  14. 14.

    , , & Linking T-cell receptor sequence to functional phenotype at the single-cell level. Nat. Biotechnol. 32, 684–692 (2014)

  15. 15.

    et al. Memory T cells in latent Mycobacterium tuberculosis infection are directed against three antigenic islands and largely contained in a CXCR3+CCR6+ TH1 subset. PLoS Pathog. 9, e1003130 (2013)

  16. 16.

    Heterogeneity of human CD4+ T cells against microbes. Annu. Rev. Immunol. 34, 317–334 (2016)

  17. 17.

    & T-cell epitope discovery technologies. Hum. Immunol. 75, 514–519 (2014)

  18. 18.

    et al. Peptide binding predictions for HLA DR, DP and DQ molecules. BMC Bioinformatics 11, 568 (2010)

  19. 19.

    et al. Predictive factors for latent tuberculosis infection among adolescents in a high-burden area in South Africa. Int. J. Tuberc. Lung Dis. 15, 331–336 (2011)

  20. 20.

    et al. Dietary gluten triggers concomitant activation of CD4+ and CD8+ αβ T cells and γδ T cells in celiac disease. Proc. Natl Acad. Sci. USA 110, 13073–13078 (2013)

  21. 21.

    et al. Naive antibody gene-segment frequencies are heritable and unaltered by chronic lymphocyte ablation. Proc. Natl Acad. Sci. USA 108, 20066–20071 (2011)

  22. 22.

    et al. Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc. Natl Acad. Sci. USA 106, 20216–20221 (2009)

  23. 23.

    et al. Dynamics and memory of heterochromatin in living cells. Cell 149, 1447–1460 (2012)

Download references


We thank the Stanford Human Immune Monitoring Center for high-throughput sequencing support, and M. Mindrinos and co-workers at Sirona Genomics for the HLA typing. We especially thank The Bill and Melinda Gates Foundation, The National Institutes of Health (2U19 AI057229) and the Howard Hughes Medical Institute for financial support, S.-A. Xue for providing the Jurkat 76 T cell line, C.-Y. Chang and R. Taniguchi for helping with lentiviral transduction, R. Hovde for valuable discussions regarding statistical measures, H. Mahomed, W. Hanekom and members of the Adolescent Cohort Study (ACS) group for enrolment and follow-up of the M. tuberculosis-infected adolescents, and R. Bedi for collecting TCR sequences from the literature. Sorting was (partially) performed in the Shared FACS Facility obtained using NIH S10 Shared instrument grant (S10RR025518-01).

Author information

Author notes

    • Olivia Hatton
    •  & Arnold Han

    Present addresses: Department of Molecular Biology, Colorado College, Colorado Springs, Colorado 80905, USA (O.H.); Department of Medicine and Microbiology and Immunology, Columbia University, New York, New York 10032, USA (A.H.).

    • Jacob Glanville
    •  & Huang Huang

    These authors contributed equally to this work.


  1. Computational and Systems Immunology Program, Stanford University School of Medicine, Stanford, California 94305, USA

    • Jacob Glanville
  2. Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, California 94305, USA

    • Jacob Glanville
    • , Huang Huang
    • , Allison Nau
    • , Lisa E. Wagar
    • , Florian Rubelt
    • , Xuhuai Ji
    •  & Mark M. Davis
  3. Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California 94305, USA

    • Huang Huang
    • , Allison Nau
    • , Lisa E. Wagar
    •  & Mark M. Davis
  4. Department of Surgery, Stanford University School of Medicine, Stanford, California 94305, USA

    • Olivia Hatton
    • , Sheri M. Krams
    •  & Olivia M. Martinez
  5. Human Immune Monitoring Center, Stanford University School of Medicine, Stanford, California 94305, USA

    • Xuhuai Ji
  6. Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA

    • Arnold Han
    •  & Scott D. Boyd
  7. PSM Biotechnology, University of San Francisco, California 94305, USA

    • Christina Pettus
    •  & Nikhil Haas
  8. La Jolla Institute for Allergy and Immunology, Division of Vaccine Discovery, La Jolla, California 92037, USA

    • Cecilia S. Lindestam Arlehamn
    •  & Alessandro Sette
  9. Department of Pathology, Stanford University School of Medicine, Stanford, California 94305, USA

    • Scott D. Boyd
  10. South African Tuberculosis Vaccine Initiative, Institute of Infectious Disease and Molecular Medicine and Division of Immunology, Department of Pathology, University of Cape Town, Cape Town, South Africa

    • Thomas J. Scriba
  11. The Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, California 94305, USA

    • Mark M. Davis


  1. Search for Jacob Glanville in:

  2. Search for Huang Huang in:

  3. Search for Allison Nau in:

  4. Search for Olivia Hatton in:

  5. Search for Lisa E. Wagar in:

  6. Search for Florian Rubelt in:

  7. Search for Xuhuai Ji in:

  8. Search for Arnold Han in:

  9. Search for Sheri M. Krams in:

  10. Search for Christina Pettus in:

  11. Search for Nikhil Haas in:

  12. Search for Cecilia S. Lindestam Arlehamn in:

  13. Search for Alessandro Sette in:

  14. Search for Scott D. Boyd in:

  15. Search for Thomas J. Scriba in:

  16. Search for Olivia M. Martinez in:

  17. Search for Mark M. Davis in:


J.G., H.H. and M.M.D. conceptualized the study; H.H. and J.G. performed the experiments with assistance from A.N. and O.H.; J.G., H.H. and M.M.D. performed analysis with assistance from C.P, S.M.K., S.D.B., and O.M.M.; J.G. authored the codebase with assistance from N.H.; X.H.J. and A.H. provided help with TCR sequencing; C.L.A. and A.S. provided the megapool and data interpretation; T.J.S. provided samples from the M. tuberculosis study; F.R. and L.E.W. provided sequencing data; J.G., H.H. and M.M.D. wrote the manuscript with input from all authors; M.M.D. supervised the study.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Mark M. Davis.

Reviewer Information Nature thanks B. Chain, R. Holt and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains a Supplementary Discussion providing an overview and application summary of GLIPH and Supplementary Methods including GLIPH documentation and training set curation details.

Excel files

  1. 1.

    Supplementary Table 1

    This file contains a table of all tetramer sorted and literature derived TCRs in test dataset.

  2. 2.

    Supplementary Table 2

    This file contains a table of structural analysis of all crystalized TCRs and their contacts.

  3. 3.

    Supplementary Table 3

    This file contains a table of all Mtb single cell sequenced TCR clones.

  4. 4.

    Supplementary Table 4

    This file contains a table of Tet+ test set GLIPH specificity groups.

  5. 5.

    Supplementary Table 5

    This file contains a table of Mtb GLIPH GLIPH specificity groups.

  6. 6.

    Supplementary Table 6

    This file contains a table of HLA associations of all Mtb donors.

  7. 7.

    Supplementary Table 7

    This file contains a table of donor replicate assignments and new predicted specificity group members.

About this article

Publication history






Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.