Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Landscape of tumor-infiltrating T cell repertoire of human cancers


We developed a computational method to infer the complementarity-determining region 3 (CDR3) sequences of tumor-infiltrating T cells in 9,142 RNA-seq samples across 29 cancer types. We identified over 600,000 CDR3 sequences, including 15% that were full length. CDR3 sequence length distribution and amino acid conservation, as well as variable gene usage, for infiltrating T cells in many tumors, except in brain and kidney cancers, resembled those for peripheral blood cells from healthy donors. We observed a strong association between T cell diversity and tumor mutation load, and we predicted SPAG5 and TSSK6 as putative immunogenic cancer/testis antigens in multiple cancers. Finally, we identified three potential immunogenic somatic mutations on the basis of their co-occurrence with CDR3 sequences. One of them, a PRAMEF4 mutation encoding p.Phe300Val, was predicted to result in peptide binding strongly to both MHC class I and class II molecules, with matched HLA types in its carriers. Our analyses have the potential to simultaneously identify immunogenic neoantigens and tumor-reactive T cell clonotypes.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Distribution of αβ T cell variable gene usage and γδ T cell abundance across multiple cancer types.
Figure 2: Length and amino acid conservation of β- and δ-chain CDR3 sequences in tumor-infiltrating T cells.
Figure 3: Public and private β-CDR3 amino acid sequences have different lengths and proportions of hydrophobic residues.
Figure 4: The diversity of T cell clonotypes positively associates with cancer somatic mutation load.
Figure 5: Association of T cell diversity with expression of cancer/testis antigens identifies SPAG5 and TSSK6 as vaccine targets.
Figure 6: Nonsynonymous mutations co-occur with CDR3 motifs.


  1. Alt, F.W. et al. VDJ recombination. Immunol. Today 13, 306–314 (1992).

    Article  CAS  Google Scholar 

  2. Davis, M.M. & Bjorkman, P.J. T-cell antigen receptor genes and T-cell recognition. Nature 334, 395–402 (1988).

    Article  CAS  Google Scholar 

  3. Warren, R.L. et al. Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. Genome Res. 21, 790–797 (2011).

    Article  CAS  Google Scholar 

  4. Robins, H.S. et al. Comprehensive assessment of T-cell receptor β-chain diversity in αβ T cells. Blood 114, 4099–4107 (2009).

    Article  CAS  Google Scholar 

  5. Rosenberg, S.A., Restifo, N.P., Yang, J.C., Morgan, R.A. & Dudley, M.E. Adoptive cell transfer: a clinical path to effective cancer immunotherapy. Nat. Rev. Cancer 8, 299–308 (2008).

    Article  CAS  Google Scholar 

  6. Sharma, P., Wagner, K., Wolchok, J.D. & Allison, J.P. Novel cancer immunotherapy agents with survival benefit: recent successes and next steps. Nat. Rev. Cancer 11, 805–812 (2011).

    Article  CAS  Google Scholar 

  7. Pardoll, D.M. The blockade of immune checkpoints in cancer immunotherapy. Nat. Rev. Cancer 12, 252–264 (2012).

    Article  CAS  Google Scholar 

  8. Savage, P.A. et al. Recognition of a ubiquitous self antigen by prostate cancer-infiltrating CD8+ T lymphocytes. Science 319, 215–220 (2008).

    Article  CAS  Google Scholar 

  9. Obenaus, M. et al. Identification of human T-cell receptors with optimal affinity to cancer antigens using antigen-negative humanized mice. Nat. Biotechnol. 33, 402–407 (2015).

    Article  CAS  Google Scholar 

  10. Tumeh, P.C. et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature 515, 568–571 (2014).

    Article  CAS  Google Scholar 

  11. Twyman-Saint Victor, C. et al. Radiation and dual checkpoint blockade activate non-redundant immune mechanisms in cancer. Nature 520, 373–377 (2015).

    Article  CAS  Google Scholar 

  12. Blachly, J.S. et al. Immunoglobulin transcript sequence and somatic hypermutation computation from unselected RNA-seq reads in chronic lymphocytic leukemia. Proc. Natl. Acad. Sci. USA 112, 4322–4327 (2015).

    Article  CAS  Google Scholar 

  13. Brown, S.D., Raeburn, L.A. & Holt, R.A. Profiling tissue-resident T cell repertoires by RNA sequencing. Genome Med. 7, 125 (2015).

    Article  CAS  Google Scholar 

  14. Bolotin, D.A. et al. MiTCR: software for T-cell receptor sequencing data analysis. Nat. Methods 10, 813–814 (2013).

    Article  CAS  Google Scholar 

  15. Grabherr, M.G. et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).

    Article  CAS  Google Scholar 

  16. Warren, R.L., Nelson, B.H. & Holt, R.A. Profiling model T-cell metagenomes with short reads. Bioinformatics 25, 458–464 (2009).

    Article  CAS  Google Scholar 

  17. Freeman, J.D., Warren, R.L., Webb, J.R., Nelson, B.H. & Holt, R.A. Profiling the T-cell receptor β-chain repertoire by massively parallel sequencing. Genome Res. 19, 1817–1824 (2009).

    Article  CAS  Google Scholar 

  18. van Heijst, J.W. et al. Quantitative assessment of T cell repertoire recovery after hematopoietic stem cell transplantation. Nat. Med. 19, 372–377 (2013).

    Article  CAS  Google Scholar 

  19. Rooney, M.S., Shukla, S.A., Wu, C.J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell 160, 48–61 (2015).

    Article  CAS  Google Scholar 

  20. Chien, Y.H. & Hampl, J. Antigen-recognition properties of murine γδ T cells. Springer Semin. Immunopathol. 22, 239–250 (2000).

    Article  CAS  Google Scholar 

  21. Crooks, G.E., Hon, G., Chandonia, J.M. & Brenner, S.E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).

    Article  CAS  Google Scholar 

  22. Dean, J. et al. Annotation of pseudogenic gene segments by massively parallel sequencing of rearranged lymphocyte receptor loci. Genome Med. 7, 123 (2015).

    Article  Google Scholar 

  23. Rock, E.P., Sibbald, P.R., Davis, M.M. & Chien, Y.H. CDR3 length in antigen-specific immune receptors. J. Exp. Med. 179, 323–328 (1994).

    Article  CAS  Google Scholar 

  24. Venturi, V., Price, D.A., Douek, D.C. & Davenport, M.P. The molecular basis for public T-cell responses? Nat. Rev. Immunol. 8, 231–238 (2008).

    Article  CAS  Google Scholar 

  25. Chowell, D. et al. TCR contact residue hydrophobicity is a hallmark of immunogenic CD8+ T cell epitopes. Proc. Natl. Acad. Sci. USA 112, E1754–E1762 (2015).

    Article  CAS  Google Scholar 

  26. Schuurs, A.H. & Verheul, H.A. Effects of gender and sex steroids on the immune response. J. Steroid Biochem. 35, 157–172 (1990).

    Article  CAS  Google Scholar 

  27. Shukla, S.A. et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat. Biotechnol. 33, 1152–1158 (2015).

    Article  CAS  Google Scholar 

  28. He, C. et al. Genome-wide detection of testis- and testicular cancer–specific alternative splicing. Carcinogenesis 28, 2484–2490 (2007).

    Article  CAS  Google Scholar 

  29. Simpson, A.J., Caballero, O.L., Jungbluth, A., Chen, Y.T. & Old, L.J. Cancer/testis antigens, gametogenesis and cancer. Nat. Rev. Cancer 5, 615–625 (2005).

    Article  CAS  Google Scholar 

  30. Caballero, O.L. & Chen, Y.T. Cancer/testis (CT) antigens: potential targets for immunotherapy. Cancer Sci. 100, 2014–2021 (2009).

    Article  CAS  Google Scholar 

  31. Drake, C.G., Lipson, E.J. & Brahmer, J.R. Breathing new life into immunotherapy: review of melanoma, lung and kidney cancer. Nat. Rev. Clin. Oncol. 11, 24–37 (2014).

    Article  CAS  Google Scholar 

  32. Silin¸a, K. et al. Sperm-associated antigens as targets for cancer immunotherapy: expression pattern and humoral immune response in cancer patients. J. Immunother. 34, 28–44 (2011).

    Article  Google Scholar 

  33. Andreatta, M., Schafer-Nielsen, C., Lund, O., Buus, S. & Nielsen, M. NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data. PLoS One 6, e26781 (2011).

    Article  CAS  Google Scholar 

  34. Dunn, G.P., Bruce, A.T., Ikeda, H., Old, L.J. & Schreiber, R.D. Cancer immunoediting: from immunosurveillance to tumor escape. Nat. Immunol. 3, 991–998 (2002).

    Article  CAS  Google Scholar 

  35. Andreatta, M. & Nielsen, M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics 32, 511–517 (2016).

    Article  CAS  Google Scholar 

  36. Nielsen, M., Lundegaard, C. & Lund, O. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics 8, 238 (2007).

    Article  Google Scholar 

  37. Grupp, S.A. et al. Chimeric antigen receptor–modified T cells for acute lymphoid leukemia. N. Engl. J. Med. 368, 1509–1518 (2013).

    Article  CAS  Google Scholar 

  38. Porter, D.L., Levine, B.L., Kalos, M., Bagg, A. & June, C.H. Chimeric antigen receptor–modified T cells in chronic lymphoid leukemia. N. Engl. J. Med. 365, 725–733 (2011).

    Article  CAS  Google Scholar 

  39. Cancer Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

  40. Li, B. & Li, J.Z. A general framework for analyzing tumor subclonality using SNP array and DNA sequencing data. Genome Biol. 15, 473 (2014).

    Article  Google Scholar 

  41. Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38, e178 (2010).

    Article  Google Scholar 

  42. Lefranc, M.P. IMGT, the International ImMunoGeneTics Information System. Cold Spring Harb. Protoc. 2011, 595–603 (2011).

    PubMed  Google Scholar 

  43. Del Monte, U. Does the cell number 10(9) still really fit one gram of tumor tissue? Cell Cycle 8, 505–506 (2009).

    Article  CAS  Google Scholar 

  44. Emerson, R.O. et al. High-throughput sequencing of T-cell receptors reveals a homogeneous repertoire of tumour-infiltrating lymphocytes in ovarian cancer. J. Pathol. 231, 433–440 (2013).

    Article  CAS  Google Scholar 

  45. R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).

Download references


We thank G. Freeman for helpful discussion during manuscript preparation. We also acknowledge the following funding sources for supporting our work: NCI grant 1U01 CA180980, National Natural Science Foundation of China grant 31329003 and a Chinese Scholarship Council Fellowship. This work was supported in part by NIH/NCI DF/HCC Kidney Cancer SPORE P50 CA101942 to S.S. and T.K.C.

Author information

Authors and Affiliations



B.L. conceived this project, developed the CDR3 calling method, processed the data sets and performed statistical analysis. T.L. performed statistical analysis, generated a subset of the figures and helped write the manuscript. B.W., J.W. and R.D. helped with analysis of CDR3 sequences. S.A.S. performed analyses using POLYSOLVER. Q.C. helped analyze the data. J.-C.P., S.S. and T.K.C. conducted experimental validation. F.S.H., C.W. and N.H. conceived some of the analyses and contributed to the manuscript. X.S.L. and J.S.L. supervised the whole study and wrote the manuscript with B.L.

Corresponding authors

Correspondence to Jun S Liu or X Shirley Liu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Workflow of CDR3 sequence assembly from RNA-seq data.

Paired-end short-read RNA-seq data were mapped to human reference genome hg19, and unmapped reads in the TCR regions were extracted for pairwise comparison. CDR3 sequences were assembled from disjoint read sets and annotated using IMGT nomenclatures.

Supplementary Figure 2 Number of reads/contigs at each step of the CDR3 assembly method.

For a selected sample, we demonstrate the number of reads or contigs kept at each step of our method. The numbers are included at the bottom of each text box. The selected sample represents the median library size of TCGA tumors with the median number of assembled CDR3 sequences.

Supplementary Figure 3 Method evaluation using TCGA tumors profiled with both TCRβ sequencing and RNA-seq.

Left, relationship between CDR3 transcripts called from TCR-seq and RNA-seq data. Middle, distribution of clonal frequencies of RNA-seq assemblies and TCR-seq transcripts. Right, another visualization of the clonal frequency distribution: the x axis shows the quantiles of clonal frequencies from immunoSeq data, and the y axis shows the fraction of above-quantile TCR transcripts called from RNA-seq data.

Supplementary Figure 4 Schematics of two simulation approaches used to validate the method developed in this work.

Descriptions for each approach can be found in the Online Methods.

Supplementary Figure 5 Performance evaluation of the CDR3 assembly algorithm using an in silico mixture experiment.

Our method was applied to data sets produced by the second simulation approach (Online Methods), and the CDR3 calls were compared to the gold standard TCR sequencing reads. (a) At different levels of T cell infiltration, our method recovered 4–6% of the infiltrating T cell repertoire, with 94–98% accuracy. (b) The called CDR3 sequences (infiltration level of 60%) were enriched for T cells with high clonal frequency. (c) Quantile–quantile plot showing that the clonal frequency for called CDR3 sequences is skewed to the higher end in comparison to the background distribution.

Supplementary Figure 6 Evaluation of the CDR3 assembly algorithm at high coverage and comparison with iSSAKE.

Both methods were applied to analyze the data sets produced from the first simulation approach (Online Methods). Called CDR3 sequences were compared to the 100 simulated transcripts, and true or false positive rates were calculated. False positive calls were defined as contigs that did not contain the CDR3 region. Standard deviation was estimated using 100 simulations at each given level of coverage. The true positive rate was the number of unique correct calls divided by 100, and the false positive rate was the number of unique incorrect calls divided by the total number of CDR3 calls. (a,b) These results were visualized as box plots for our method (a) and iSSAKE (b). We did not include a precision recall curve at each coverage setting because there was not a continuous threshold that would affect the performance in our algorithm.

Supplementary Figure 7 Differential usage of TRAV and TRBV genes in lower-grade glioma and kidney clear cell cancer.

(ac) Bar plots of TRAV and TRBV gene usage in glioma (a,b) and kidney tumors (c) are presented. TRAV and TRBV genes are in the same order as in Figure 1a,b, and the fractions were calculated in the same way.

Supplementary Figure 8 Distribution of read counts for assembled CDR3 contigs.

Read counts for each CDR3 contig were obtained from the assembly algorithm. When shared across multiple contigs, the count for a read was evenly split between each contig.

Supplementary Figure 9 Association of CPK with genes involved in cytolysis.

The expression levels of previously defined cytolytic genes19,27 were associated with CPK. The heat map displays values from partial Spearman’s correlation corrected for tumor purity. Cancers with fewer than ten samples were excluded. Statistical significance was evaluated using partial Spearman’s correlation test.

Supplementary Figure 10 Scatterplot between CPK and mutation load.

Each point on the plot represents a cancer sample, with color referring to the corresponding disease type. The statistical significance of the association was evaluated using Spearman’s correlation. This represents a complementary analysis to Figure 4b.

Supplementary Figure 11 Fraction of public β-CDR3 sequences across cancer types.

For each cancer type, the fraction was calculated as the number β-CDR3 sequences in the final public sequence set divided by the number of total distinct β-CDR3 sequences. All fractions were then mean centered, with the mean being the number of total public β-CDR3 sequences divided by the number of total distinct β-CDR3 sequences. Significance was evaluated using the binomial test, with the mean being the expected frequency and counts for public and total β-CDR3 sequences for each cancer as observations.

Supplementary Figure 12 MHC I binding predictions for SPAG5 and TSSK6 protein sequences.

Complete amino acid sequences for SPAG5 and TSSK6 were obtained from the NCBI protein database. All tiling nine-amino-acid sequences were analyzed by NetMHC4.0 for MHC I binding predictions. The peptides with strong binding (rank <0.5%) to an MHC I allele are underlined, and the corresponding MHC I allele is labeled by color. Only common MHC I alleles (HLA-A01:01, HLA-A02:01, HLA-A03:01, HLA-A07:02 and HLA-B08:01) with high population frequencies are displayed in the plot for visualization purposes.

Supplementary Figure 13 MHC II binding predictions for peptides produced from PRAMEF4 F300V.

MHC II binding was predicted by NetMHC-II 2.2. Fifteen-amino-acid sequences are the standard input for the webserver, and one mutated peptide was predicted to bind to three MHC II alleles with high affinity.

Supplementary Figure 14 Box plot of PRAMEF4 expression levels in multiple cancer types and paired normal tissues.

Testicular cancer (TGCT) is highlighted by the blue box. Numbers of outliers are included in red along the top of the plot.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14 and Supplementary Tables 1–3. (PDF 2171 kb)

Supplementary Data

Distinct deidentified CDR3 sequence calls generated in this study in fasta format. (ZIP 4151 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, B., Li, T., Pignon, JC. et al. Landscape of tumor-infiltrating T cell repertoire of human cancers. Nat Genet 48, 725–732 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer