Mapping the functional landscape of T cell receptor repertoires by single-T cell transcriptomics


Many experimental and bioinformatics approaches have been developed to characterize the human T cell receptor (TCR) repertoire. However, the unknown functional relevance of TCR profiling hinders unbiased interpretation of the biology of T cells. To address this inadequacy, we developed tessa, a tool to integrate TCRs with gene expression of T cells to estimate the effect that TCRs confer on the phenotypes of T cells. Tessa leveraged techniques combining single-cell RNA-sequencing with TCR sequencing. We validated tessa and showed its superiority over existing approaches that investigate only the TCR sequences. With tessa, we demonstrated that TCR similarity constrains the phenotypes of T cells to be similar and dictates a gradient in antigen targeting efficiency of T cell clonotypes with convergent TCRs. We showed this constraint could predict a functional dichotomization of T cells postimmunotherapy treatment and is weakened in tumor contexts.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: The tessa algorithm.
Fig. 2: TCR networks demonstrate a gradient of targeting efficiency.
Fig. 3: TCR similarity determines fate of T cells postimmunotherapy treatment.
Fig. 4: CD8+ T cells are functionally constrained by TCRs differently in healthy donors and tumor patients.

Data availability

The bulk RNA-seq datasets used for deriving TCRs and then for the auto-encoder training are publicly available at (TCGA23), (IEDB) and (McPAS25). We made the Kidney-bulkRNA24 dataset available in csv format at All scRNA-seq/TCR-seq datasets are publicly available. The NSCLC-1 and healthy PBMC-1 datasets are available on the 10x website The healthy-CD8 1–4 datasets are available on The healthy PBMC-2 dataset is also available on the 10x Genomics website The NSCLC-2 (ref. 26), CRC27 and HCC28 datasets are downloaded from the European Genome-Phenome Archive (EGA) under accession numbers EGAS00001002430, EGAS00001002791 and EGAS00001002072, respectively. The Breast-1–5 (ref. 29) datasets are available on the Gene Expression Omnibus (GEO) under accession numbers GSE114727 and GSE114724. The Melanoma30, BCC31 and ECCITE-Seq16 datasets are also on the GEO database under study numbers GSE123139, GSE113590 and GSE126310. The Glanville10 dataset is downloaded from The Dash11 dataset is available in the National Center for Biotechnology Information Sequence Read Archive under accession number SRP101659. The details of the data used, including sample size, role in the analysis and references, are shown in Supplementary Table 1. All scRNA-seq data were involved in Fig. 2 (directly or indirectly mentioned), the BCC scRNA-seq data were used in Fig. 3 and all scRNA-seq data were used in Fig. 4. Source data are provided with this paper.

Code availability

The tessa model is available at ( The SCINA model is available at (


  1. 1.

    Oettinger, M. A. V(D)J recombination: on the cutting edge. Curr. Opin. Cell Biol. 11, 325–329 (1999).

    CAS  Article  Google Scholar 

  2. 2.

    Jung, D. & Alt, F. W. Unraveling V(D)J recombination: insights into gene regulation. Cell 116, 299–311 (2004).

    CAS  Article  Google Scholar 

  3. 3.

    Kappler, J. et al. The major histocompatibility complex-restricted antigen receptor on T cells in mouse and man: identification of constant and variable peptides. Cell 35, 295–302 (1983).

    CAS  Article  Google Scholar 

  4. 4.

    Haskins, K. et al. The major histocompatibility complex-restricted antigen receptor on T cells. I. Isolation with a monoclonal antibody. J. Exp. Med. 157, 1149–1169 (1983).

    CAS  Article  Google Scholar 

  5. 5.

    Staveley-O’Carroll, K. et al. Induction of antigen-specific T cell anergy: an early event in the course of tumor progression. Proc. Natl Acad. Sci. USA 95, 1178–1183 (1998).

    Article  Google Scholar 

  6. 6.

    Skapenko, A., Leipe, J., Lipsky, P. E. & Schulze-Koops, H. The role of the T cell in autoimmune inflammation. Arthritis Res. Ther. 7, S4–S14 (2005).

    Article  Google Scholar 

  7. 7.

    Stubbington, M. J. T. et al. T cell fate and clonality inference from single-cell transcriptomes. Nat. Methods 13, 329–332 (2016).

    Article  Google Scholar 

  8. 8.

    Bolotin, D. A. et al. Antigen receptor repertoire profiling from RNA-seq data. Nat. Biotechnol. 35, 908–911 (2017).

    CAS  Article  Google Scholar 

  9. 9.

    Eltahla, A. A. et al. Linking the T cell receptor to the single cell transcriptome in antigen-specific human T cells. Immunol. Cell Biol. 94, 604–611 (2016).

    CAS  Article  Google Scholar 

  10. 10.

    Glanville, J. et al. Identifying specificity groups in the T cell receptor repertoire. Nature 547, 94–98 (2017).

    CAS  Article  Google Scholar 

  11. 11.

    Dash, P. et al. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature 547, 89–93 (2017).

    CAS  Article  Google Scholar 

  12. 12.

    Tubo, N. J. et al. Single naive CD4+ T cells from a diverse repertoire produce different effector cell types during infection. Cell 153, 785–796 (2013).

    CAS  Article  Google Scholar 

  13. 13.

    Buchholz, V. R. et al. Disparate individual fates compose robust CD8+ T cell immunity. Science 340, 630–635 (2013).

    CAS  Article  Google Scholar 

  14. 14.

    Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).

    CAS  Article  Google Scholar 

  15. 15.

    Sheng, K., Cao, W., Niu, Y., Deng, Q. & Zong, C. Effective detection of variation in single-cell transcriptomes using MATQ-seq. Nat. Methods 14, 267–270 (2017).

    CAS  Article  Google Scholar 

  16. 16.

    Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).

    CAS  Article  Google Scholar 

  17. 17.

    Atchley, W. R., Zhao, J., Fernandes, A. D. & Drüke, T. Solving the protein sequence metric problem. Proc. Natl Acad. Sci. USA 102, 6395–6400 (2005).

    CAS  Article  Google Scholar 

  18. 18.

    Ballard, D. Modular learning in neural networks. In Proc. Sixth National Conference on Artificial Intelligence Vol. 1, 279–284 (ACM, 1987).

  19. 19.

    Ostmeyer, J. et al. Statistical classifiers for diagnosing disease from immune repertoires: a case study using multiple sclerosis. BMC Bioinf. 18, 401 (2017).

    Article  Google Scholar 

  20. 20.

    Ostmeyer, J., Christley, S., Toby, I. T. & Cowell, L. G. Biophysicochemical motifs in T-cell receptor sequences distinguish repertoires from tumor-infiltrating lymphocyte and adjacent healthy tissue. Cancer Res. 79, 1671–1680 (2019).

    CAS  Article  Google Scholar 

  21. 21.

    Thomas, N. et al. Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence. Bioinformatics 30, 3181–3188 (2014).

    CAS  Article  Google Scholar 

  22. 22.

    Zhang, A. W. et al. Interfaces of malignant and immunologic clonal dynamics in ovarian cancer. Cell 173, 1755–1769.e22 (2018).

    CAS  Article  Google Scholar 

  23. 23.

    Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830.e14 (2018).

    CAS  Article  Google Scholar 

  24. 24.

    Wang, T. et al. An empirical approach leveraging tumorgrafts to dissect the tumor microenvironment in renal cell carcinoma identifies missing link to prognostic inflammatory factors. Cancer Disco. 8, 1142–1155 (2018).

    CAS  Article  Google Scholar 

  25. 25.

    Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E. & Friedman, N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics 33, 2924–2929 (2017).

    CAS  Article  Google Scholar 

  26. 26.

    Guo, X. et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat. Med. 24, 978–985 (2018).

    CAS  Article  Google Scholar 

  27. 27.

    Zhang, L. et al. Lineage tracking reveals dynamic relationships of T cells in colorectal cancer. Nature 564, 268–272 (2018).

    CAS  Article  Google Scholar 

  28. 28.

    Zheng, C. et al. Landscape of Infiltrating T cells in liver cancer revealed by single-cell sequencing. Cell 169, 1342–1356.e16 (2017).

    CAS  Article  Google Scholar 

  29. 29.

    Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308.e36 (2018).

    CAS  Article  Google Scholar 

  30. 30.

    Li, H. et al. Dysfunctional CD8 T cells form a proliferative, dynamically regulated compartment within human melanoma. Cell 176, 775–789.e18 (2019).

    CAS  Article  Google Scholar 

  31. 31.

    Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat. Med. 25, 1251–1259 (2019).

    CAS  Article  Google Scholar 

  32. 32.

    Eduati, F. et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33, 933–940 (2015).

    CAS  Article  Google Scholar 

  33. 33.

    Bansal, M. et al. A community computational challenge to predict the activity of pairs of compounds. Nat. Biotechnol. 32, 1213–1222 (2014).

    CAS  Article  Google Scholar 

  34. 34.

    Costello, J. C. & Stolovitzky, G. Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin. Pharmacol. Ther. 93, 396–398 (2013).

    CAS  Article  Google Scholar 

  35. 35.

    Waugh, K. A. et al. Molecular profile of tumor-specific CD8+ T cell hypofunction in a transplantable murine cancer model. J. Immunol. 197, 1477–1488 (2016).

    CAS  Article  Google Scholar 

  36. 36.

    Wu, A. A., Drake, V., Huang, H.-S., Chiu, S. & Zheng, L. Reprogramming the tumor microenvironment: tumor-induced immunosuppressive factors paralyze T cells. Oncoimmunology 4, e1016700 (2015).

    Article  Google Scholar 

  37. 37.

    Burkholder, B. et al. Tumor-induced perturbations of cytokines and immune cell networks. Biochim. Biophys. Acta 1845, 182–201 (2014).

    CAS  PubMed  Google Scholar 

  38. 38.

    Conley, J. M., Gallagher, M. P. & Berg, L. J. T cells and gene regulation: The switching on and turning up of genes after T cell receptor stimulation in CD8 T cells. Front. Immunol. (2016).

  39. 39.

    Cho, J.-H. et al. Unique features of naive CD8+ T cell activation by IL-2. J. Immunol. 191, 5559–5573 (2013).

    CAS  Article  Google Scholar 

  40. 40.

    Iezzi, G., Karjalainen, K. & Lanzavecchia, A. The duration of antigenic stimulation determines the fate of naive and effector T cells. Immunity 8, 89–95 (1998).

    CAS  Article  Google Scholar 

  41. 41.

    Moskophidis, D., Lechner, F., Pircher, H. & Zinkernagel, R. M. Virus persistence in acutely infected immunocompetent mice by exhaustion of antiviral cytotoxic effector T cells. Nature 362, 758–761 (1993).

    CAS  Article  Google Scholar 

  42. 42.

    Kalergis, A. M. et al. Efficient T cell activation requires an optimal dwell-time of interaction between the TCR and the pMHC complex. Nat. Immunol. 2, 229–234 (2001).

    CAS  Article  Google Scholar 

  43. 43.

    Corse, E., Gottschalk, R. A., Krogsgaard, M. & Allison, J. P. Attenuated T cell responses to a high-potency ligand in vivo. PLoS Biol. (2010).

  44. 44.

    Mikolov, T., Chen, K., Corrado, G.S., & Dean, J. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013).

  45. 45.

    Zhang, Z. et al. SCINA: a semi-supervised subtyping algorithm of single cells and bulk samples. Genes (2019).

  46. 46.

    Zhang, Z. jcao89757/TESSA: mapping the functional landscape of T cell receptor repertoire by single T cell transcriptomics. Zenodo (2020).

Download references


We thank L.H.R. Xu for his valuable input on the manuscript writing. This study was supported by the National Institutes of Health (NIH) (grant nos. CCSG 5P30CA142543 to T.W. and R15GM131390 to X.W.) and Cancer Prevention Research Institute of Texas (grant no. CPRIT RP190208 to T.W.).

Author information




Z.Z. contributed to the computational analyses and manuscript writing. D.X. and X.W. contributed to the design and write-up of the statistical methodologies. H.L. provided valuable suggestions on the direction of the project, and contributed to manuscript writing. T.W. contributed to the overall supervision of the project, study design and manuscript writing.

Corresponding author

Correspondence to Tao Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Madhura Mukhopadhyay was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Details of the stacked auto-encoder for TCR embedding.

a, The structure of the auto-encoder, with the configurations of each layer shown. b, Typical examples of TCR CDR3b sequences, heatmaps of the initially embedded ‘Atchley’ matrices of TCRs, and heatmaps of the auto-encoder-reconstructed ‘Athley’ matrices. The TCR sequence examples were not used in the training step of the auto-encoder. c, Scatterplots showing the consistency between the ‘Atchley factor’ values of the original and re-constructed TCRs. Green points represent tiles in the heatmaps in (b). Source data

Extended Data Fig. 2 Scatterplots showing the relationships between the distances of TCRs and the distances of RNA expression levels for several more datasets.

Both distances are calculated in a pair-wise manner between all the T cell clonotypes of each dataset. Four example datasets are shown: Healthy-CD8-3 (a), Healthy-CD8-4 (b), Breast-1 (c), and Breast-2 (d) (Supplementary Table 1). The P values indicate the significance of the Pearson correlation coefficients. The shaded areas denote the 95% confidence intervals for linear regressions. Source data

Extended Data Fig. 3 The weights of the TCR embeddings learned from tessa.

The X axis shows the digits of the 30-dimensional embeddings, and the Y axis shows the weights learned for all datasets. Each bar represents one digit of the weights and shows the values of that digit obtained from all the 19 scRNA datasets in the Supplementary Table 1. Source data

Extended Data Fig. 4 Benchmarking results using GLIPH.

a, Clustering rates of the four Healthy-CD8 datasets from 10x Genomics, the Glanville dataset, and the Dash dataset under different global convergence distance cutoff (‘gccutoff’) values (Supplementary Table 1). The dashed lines represented the tessa clustering rates of the corresponding datasets. b, Clustering purities of GLIPH when the ‘gccutoff’ equals to 3. The cutoff value was selected so that the GLIPH clusters achieved clustering rates that are most similar to the tessa networks. The clustering purities were calculated with the same method as in Fig. 2. c, d, The GLIPH network purities (c) and number of networks (d) with different ‘gccutoff’ values, compared with the tessa network purities and the number of networks. Source data

Extended Data Fig. 5 Clustering of TCR clonotypes informed by tessa is reflective of antigen binding specificity.

The antigen binding specificity of 207 Human TCRβ chains from 704 T cells were profiled against two epitopes in the Dash dataset, and 276 TCRs from 415 T cells against three epitopes in the Glanville dataset. a, b, T-SNE plots showing the TCR clonotypes in the space of the TCR embeddings, with the embeddings adjusted by the tessa-inferred weights. The hierarchical clustering tree cutoff used in the two plots was represented with green dashed lines in c-f. Each point in the plots represents one TCR clonotype, and the size of the point refers to the clone size. Points are colored by the true antigens that the corresponding TCRs target according to the original report. Points are connected if they are clustered into the same network based on hierarchical clustering of the TCR embeddings. T cell clones with only one cell were deemed as having low confidence and unclustered clones, which does not affect the calculation of the purities, were excluded from visualization. c, d, The numbers of TCR networks and the clustering rates with different hierarchical tree cutoffs in the Dash dataset (c) and in the Glanville dataset (d). Cluster rates were calculated as the number of TCR clonotypes that are clustered with at least another TCR clonotype, divided by the total number of TCR clonotypes. e, f, The network purities and p-values testing the significance of the purities with different hierarchical tree cutoffs in the Dash dataset (c) and the Glanville dataset (d). The network purity and P value calculations were described in the Methods section. Source data

Extended Data Fig. 6 T cell pathway activity scores of the different T cell subsets in the BCC dataset.

The naive and activated pathways are shown, to be compared against the inhibition, memory and exhausted pathways shown in Fig. 3. The T cell subsets were the same as those in Fig. 3e-g. Source data

Extended Data Fig. 7 Pseudotime analysis of the different T cell subsets in the BCC dataset.

The T cell subsets were the same as those in Fig. 3e–g. Source data

Extended Data Fig. 8 A cartoon sketch shows how the unexplained variance in gene expression of the TCR networks were determined.

Details were described in the Materials and Methods section. Source data

Supplementary information

Supplementary Information

Supplementary Tables 1 and 2 and Notes 1 and 2.

Reporting Summary

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Xiong, D., Wang, X. et al. Mapping the functional landscape of T cell receptor repertoires by single-T cell transcriptomics. Nat Methods 18, 92–99 (2021).

Download citation


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing