Abstract
Many experimental and bioinformatics approaches have been developed to characterize the human T cell receptor (TCR) repertoire. However, the unknown functional relevance of TCR profiling hinders unbiased interpretation of the biology of T cells. To address this inadequacy, we developed tessa, a tool to integrate TCRs with gene expression of T cells to estimate the effect that TCRs confer on the phenotypes of T cells. Tessa leveraged techniques combining single-cell RNA-sequencing with TCR sequencing. We validated tessa and showed its superiority over existing approaches that investigate only the TCR sequences. With tessa, we demonstrated that TCR similarity constrains the phenotypes of T cells to be similar and dictates a gradient in antigen targeting efficiency of T cell clonotypes with convergent TCRs. We showed this constraint could predict a functional dichotomization of T cells postimmunotherapy treatment and is weakened in tumor contexts.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The bulk RNA-seq datasets used for deriving TCRs and then for the auto-encoder training are publicly available at https://gdc.cancer.gov/about-data/publications/panimmune (TCGA23), https://www.iedb.org/database_export_v3.php (IEDB) and http://friedmanlab.weizmann.ac.il/McPAS-TCR/ (McPAS25). We made the Kidney-bulkRNA24 dataset available in csv format at https://github.com/jcao89757/TESSA/tree/master/Tessa_released_data. All scRNA-seq/TCR-seq datasets are publicly available. The NSCLC-1 and healthy PBMC-1 datasets are available on the 10x website https://support.10xgenomics.com/single-cell-vdj/datasets/2.2.0. The healthy-CD8 1–4 datasets are available on https://www.10xgenomics.com/resources/application-notes/a-new-way-of-exploring-immunity-linking-highly-multiplexed-antigen-recognition-to-immune-repertoire-and-phenotype/. The healthy PBMC-2 dataset is also available on the 10x Genomics website https://support.10xgenomics.com/single-cell-vdj/datasets/3.0.0. The NSCLC-2 (ref. 26), CRC27 and HCC28 datasets are downloaded from the European Genome-Phenome Archive (EGA) under accession numbers EGAS00001002430, EGAS00001002791 and EGAS00001002072, respectively. The Breast-1–5 (ref. 29) datasets are available on the Gene Expression Omnibus (GEO) under accession numbers GSE114727 and GSE114724. The Melanoma30, BCC31 and ECCITE-Seq16 datasets are also on the GEO database under study numbers GSE123139, GSE113590 and GSE126310. The Glanville10 dataset is downloaded from https://doi.org/10.1038/nature22976. The Dash11 dataset is available in the National Center for Biotechnology Information Sequence Read Archive under accession number SRP101659. The details of the data used, including sample size, role in the analysis and references, are shown in Supplementary Table 1. All scRNA-seq data were involved in Fig. 2 (directly or indirectly mentioned), the BCC scRNA-seq data were used in Fig. 3 and all scRNA-seq data were used in Fig. 4. Source data are provided with this paper.
Code availability
The tessa model is available at https://github.com/jcao89757/tessa (https://doi.org/10.5281/zenodo.4161819)46. The SCINA model is available at https://github.com/jcao89757/SCINA (https://doi.org/10.3390/genes10070531)45.
References
Oettinger, M. A. V(D)J recombination: on the cutting edge. Curr. Opin. Cell Biol. 11, 325–329 (1999).
Jung, D. & Alt, F. W. Unraveling V(D)J recombination: insights into gene regulation. Cell 116, 299–311 (2004).
Kappler, J. et al. The major histocompatibility complex-restricted antigen receptor on T cells in mouse and man: identification of constant and variable peptides. Cell 35, 295–302 (1983).
Haskins, K. et al. The major histocompatibility complex-restricted antigen receptor on T cells. I. Isolation with a monoclonal antibody. J. Exp. Med. 157, 1149–1169 (1983).
Staveley-O’Carroll, K. et al. Induction of antigen-specific T cell anergy: an early event in the course of tumor progression. Proc. Natl Acad. Sci. USA 95, 1178–1183 (1998).
Skapenko, A., Leipe, J., Lipsky, P. E. & Schulze-Koops, H. The role of the T cell in autoimmune inflammation. Arthritis Res. Ther. 7, S4–S14 (2005).
Stubbington, M. J. T. et al. T cell fate and clonality inference from single-cell transcriptomes. Nat. Methods 13, 329–332 (2016).
Bolotin, D. A. et al. Antigen receptor repertoire profiling from RNA-seq data. Nat. Biotechnol. 35, 908–911 (2017).
Eltahla, A. A. et al. Linking the T cell receptor to the single cell transcriptome in antigen-specific human T cells. Immunol. Cell Biol. 94, 604–611 (2016).
Glanville, J. et al. Identifying specificity groups in the T cell receptor repertoire. Nature 547, 94–98 (2017).
Dash, P. et al. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature 547, 89–93 (2017).
Tubo, N. J. et al. Single naive CD4+ T cells from a diverse repertoire produce different effector cell types during infection. Cell 153, 785–796 (2013).
Buchholz, V. R. et al. Disparate individual fates compose robust CD8+ T cell immunity. Science 340, 630–635 (2013).
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
Sheng, K., Cao, W., Niu, Y., Deng, Q. & Zong, C. Effective detection of variation in single-cell transcriptomes using MATQ-seq. Nat. Methods 14, 267–270 (2017).
Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).
Atchley, W. R., Zhao, J., Fernandes, A. D. & Drüke, T. Solving the protein sequence metric problem. Proc. Natl Acad. Sci. USA 102, 6395–6400 (2005).
Ballard, D. Modular learning in neural networks. In Proc. Sixth National Conference on Artificial Intelligence Vol. 1, 279–284 (ACM, 1987).
Ostmeyer, J. et al. Statistical classifiers for diagnosing disease from immune repertoires: a case study using multiple sclerosis. BMC Bioinf. 18, 401 (2017).
Ostmeyer, J., Christley, S., Toby, I. T. & Cowell, L. G. Biophysicochemical motifs in T-cell receptor sequences distinguish repertoires from tumor-infiltrating lymphocyte and adjacent healthy tissue. Cancer Res. 79, 1671–1680 (2019).
Thomas, N. et al. Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence. Bioinformatics 30, 3181–3188 (2014).
Zhang, A. W. et al. Interfaces of malignant and immunologic clonal dynamics in ovarian cancer. Cell 173, 1755–1769.e22 (2018).
Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830.e14 (2018).
Wang, T. et al. An empirical approach leveraging tumorgrafts to dissect the tumor microenvironment in renal cell carcinoma identifies missing link to prognostic inflammatory factors. Cancer Disco. 8, 1142–1155 (2018).
Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E. & Friedman, N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics 33, 2924–2929 (2017).
Guo, X. et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat. Med. 24, 978–985 (2018).
Zhang, L. et al. Lineage tracking reveals dynamic relationships of T cells in colorectal cancer. Nature 564, 268–272 (2018).
Zheng, C. et al. Landscape of Infiltrating T cells in liver cancer revealed by single-cell sequencing. Cell 169, 1342–1356.e16 (2017).
Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308.e36 (2018).
Li, H. et al. Dysfunctional CD8 T cells form a proliferative, dynamically regulated compartment within human melanoma. Cell 176, 775–789.e18 (2019).
Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat. Med. 25, 1251–1259 (2019).
Eduati, F. et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33, 933–940 (2015).
Bansal, M. et al. A community computational challenge to predict the activity of pairs of compounds. Nat. Biotechnol. 32, 1213–1222 (2014).
Costello, J. C. & Stolovitzky, G. Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin. Pharmacol. Ther. 93, 396–398 (2013).
Waugh, K. A. et al. Molecular profile of tumor-specific CD8+ T cell hypofunction in a transplantable murine cancer model. J. Immunol. 197, 1477–1488 (2016).
Wu, A. A., Drake, V., Huang, H.-S., Chiu, S. & Zheng, L. Reprogramming the tumor microenvironment: tumor-induced immunosuppressive factors paralyze T cells. Oncoimmunology 4, e1016700 (2015).
Burkholder, B. et al. Tumor-induced perturbations of cytokines and immune cell networks. Biochim. Biophys. Acta 1845, 182–201 (2014).
Conley, J. M., Gallagher, M. P. & Berg, L. J. T cells and gene regulation: The switching on and turning up of genes after T cell receptor stimulation in CD8 T cells. Front. Immunol. https://doi.org/10.3389/fimmu.2016.00076 (2016).
Cho, J.-H. et al. Unique features of naive CD8+ T cell activation by IL-2. J. Immunol. 191, 5559–5573 (2013).
Iezzi, G., Karjalainen, K. & Lanzavecchia, A. The duration of antigenic stimulation determines the fate of naive and effector T cells. Immunity 8, 89–95 (1998).
Moskophidis, D., Lechner, F., Pircher, H. & Zinkernagel, R. M. Virus persistence in acutely infected immunocompetent mice by exhaustion of antiviral cytotoxic effector T cells. Nature 362, 758–761 (1993).
Kalergis, A. M. et al. Efficient T cell activation requires an optimal dwell-time of interaction between the TCR and the pMHC complex. Nat. Immunol. 2, 229–234 (2001).
Corse, E., Gottschalk, R. A., Krogsgaard, M. & Allison, J. P. Attenuated T cell responses to a high-potency ligand in vivo. PLoS Biol. https://doi.org/10.1371/journal.pbio.1000481 (2010).
Mikolov, T., Chen, K., Corrado, G.S., & Dean, J. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013).
Zhang, Z. et al. SCINA: a semi-supervised subtyping algorithm of single cells and bulk samples. Genes https://doi.org/10.3390/genes10070531 (2019).
Zhang, Z. jcao89757/TESSA: mapping the functional landscape of T cell receptor repertoire by single T cell transcriptomics. Zenodo https://doi.org/10.5281/zenodo.4161819 (2020).
Acknowledgements
We thank L.H.R. Xu for his valuable input on the manuscript writing. This study was supported by the National Institutes of Health (NIH) (grant nos. CCSG 5P30CA142543 to T.W. and R15GM131390 to X.W.) and Cancer Prevention Research Institute of Texas (grant no. CPRIT RP190208 to T.W.).
Author information
Authors and Affiliations
Contributions
Z.Z. contributed to the computational analyses and manuscript writing. D.X. and X.W. contributed to the design and write-up of the statistical methodologies. H.L. provided valuable suggestions on the direction of the project, and contributed to manuscript writing. T.W. contributed to the overall supervision of the project, study design and manuscript writing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Madhura Mukhopadhyay was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Details of the stacked auto-encoder for TCR embedding.
a, The structure of the auto-encoder, with the configurations of each layer shown. b, Typical examples of TCR CDR3b sequences, heatmaps of the initially embedded ‘Atchley’ matrices of TCRs, and heatmaps of the auto-encoder-reconstructed ‘Athley’ matrices. The TCR sequence examples were not used in the training step of the auto-encoder. c, Scatterplots showing the consistency between the ‘Atchley factor’ values of the original and re-constructed TCRs. Green points represent tiles in the heatmaps in (b).
Extended Data Fig. 2 Scatterplots showing the relationships between the distances of TCRs and the distances of RNA expression levels for several more datasets.
Both distances are calculated in a pair-wise manner between all the T cell clonotypes of each dataset. Four example datasets are shown: Healthy-CD8-3 (a), Healthy-CD8-4 (b), Breast-1 (c), and Breast-2 (d) (Supplementary Table 1). The P values indicate the significance of the Pearson correlation coefficients. The shaded areas denote the 95% confidence intervals for linear regressions.
Extended Data Fig. 3 The weights of the TCR embeddings learned from tessa.
The X axis shows the digits of the 30-dimensional embeddings, and the Y axis shows the weights learned for all datasets. Each bar represents one digit of the weights and shows the values of that digit obtained from all the 19 scRNA datasets in the Supplementary Table 1.
Extended Data Fig. 4 Benchmarking results using GLIPH.
a, Clustering rates of the four Healthy-CD8 datasets from 10x Genomics, the Glanville dataset, and the Dash dataset under different global convergence distance cutoff (‘gccutoff’) values (Supplementary Table 1). The dashed lines represented the tessa clustering rates of the corresponding datasets. b, Clustering purities of GLIPH when the ‘gccutoff’ equals to 3. The cutoff value was selected so that the GLIPH clusters achieved clustering rates that are most similar to the tessa networks. The clustering purities were calculated with the same method as in Fig. 2. c, d, The GLIPH network purities (c) and number of networks (d) with different ‘gccutoff’ values, compared with the tessa network purities and the number of networks.
Extended Data Fig. 5 Clustering of TCR clonotypes informed by tessa is reflective of antigen binding specificity.
The antigen binding specificity of 207 Human TCRβ chains from 704 T cells were profiled against two epitopes in the Dash dataset, and 276 TCRs from 415 T cells against three epitopes in the Glanville dataset. a, b, T-SNE plots showing the TCR clonotypes in the space of the TCR embeddings, with the embeddings adjusted by the tessa-inferred weights. The hierarchical clustering tree cutoff used in the two plots was represented with green dashed lines in c-f. Each point in the plots represents one TCR clonotype, and the size of the point refers to the clone size. Points are colored by the true antigens that the corresponding TCRs target according to the original report. Points are connected if they are clustered into the same network based on hierarchical clustering of the TCR embeddings. T cell clones with only one cell were deemed as having low confidence and unclustered clones, which does not affect the calculation of the purities, were excluded from visualization. c, d, The numbers of TCR networks and the clustering rates with different hierarchical tree cutoffs in the Dash dataset (c) and in the Glanville dataset (d). Cluster rates were calculated as the number of TCR clonotypes that are clustered with at least another TCR clonotype, divided by the total number of TCR clonotypes. e, f, The network purities and p-values testing the significance of the purities with different hierarchical tree cutoffs in the Dash dataset (c) and the Glanville dataset (d). The network purity and P value calculations were described in the Methods section.
Extended Data Fig. 6 T cell pathway activity scores of the different T cell subsets in the BCC dataset.
Extended Data Fig. 7 Pseudotime analysis of the different T cell subsets in the BCC dataset.
The T cell subsets were the same as those in Fig. 3e–g.
Extended Data Fig. 8 A cartoon sketch shows how the unexplained variance in gene expression of the TCR networks were determined.
Details were described in the Materials and Methods section.
Supplementary information
Supplementary Information
Supplementary Tables 1 and 2 and Notes 1 and 2.
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 8
Statistical source data.
Rights and permissions
About this article
Cite this article
Zhang, Z., Xiong, D., Wang, X. et al. Mapping the functional landscape of T cell receptor repertoires by single-T cell transcriptomics. Nat Methods 18, 92–99 (2021). https://doi.org/10.1038/s41592-020-01020-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-020-01020-3