Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Reconstituting T cell receptor selection in-silico


Each T cell receptor (TCR) gene is created without regard for which substances (antigens) the receptor can recognize. T cell selection culls developing T cells when their TCRs (i) fail to recognize major histocompatibility complexes (MHCs) that act as antigen presenting platforms or (ii) recognize with high affinity self-antigens derived from healthy cells and tissue. While T cell selection has been thoroughly studied, little is known about which TCRs are retained or removed by this process. Therefore, we develop an approach using TCR gene sequencing and machine learning to identify patterns in TCR protein sequences influencing the outcome of T cell receptor selection. We verify the trained models classify TCRs from developing T cells as being before selection and TCRs from mature T cells as being after selection. Our approach may provide future avenues for studying the relationship between T cell selection and conditions like autoimmune diseases.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1
Fig. 2: Non-productive TCR genes do not express a functional TCR chain.
Fig. 3
Fig. 4

Code availability

All computer code is written in Python3 using NumPy, Tensorflow v1.14, and Keras. Aspects of the computer code can be found at A full copy of the computer source code, detailed instructions for running computer code, and trained models are available upon request under a signed confidentiality agreement. Email the corresponding author if interested.


  1. Davis CB, Killeen N, Crooks MEC, Raulet D, Littman DR. Evidence for a stochastic mechanism in the differentiation of mature subsets of T lymphocytes. Cell. 1993;73:237–47.

    Article  CAS  Google Scholar 

  2. Itano A, Kioussis D, Robey E. Stochastic component to development of class I major histocompatibility complex-specific T cells. Proc Natl Acad Sci USA. 1994;91:220–4.

    Article  CAS  Google Scholar 

  3. Yates AJ. Theories and quantification of thymic selection. Front Immunol. 2014;5:13–13.

    Article  Google Scholar 

  4. Baumann B, Potash MJ, Köhler G. Consequences of frameshift mutations at the immunoglobulin heavy chain locus of the mouse. EMBO J. 1985;4:351–9.

    Article  CAS  Google Scholar 

  5. Li S, Wilkinson MF. Nonsense Surveillance in Lymphocytes. Immunity. 1998;8:135–41.

    Article  CAS  Google Scholar 

  6. Currier JR, Yassai M, Robinson MA, Gorski J. Molecular defects in TCRBV genes preclude thymic selection and limit the expressed TCR repertoire. J Immunol. 1996;157:170–5.

    CAS  PubMed  Google Scholar 

  7. Manfras BJ, Terjung D, Boehm BO. Non-productive human TCR β chain genes represent V-D-J diversity before selection upon function: insight into biased usage of TCRBD and TCRBJ genes and diversity of CDR3 region length. Hum Immunol. 1999;60:1090–1100.

    Article  CAS  Google Scholar 

  8. Li H, Ye C, Ji G, Wu X, Xiang Z, Li Y, et al. Recombinatorial biases and convergent recombination determine interindividual TCRβ sharing in murine thymocytes. J Immunol. 2012;189:2404–13.

    Article  CAS  Google Scholar 

  9. Heikkilä N, Vanhanen R, Yohannes DA, Kleino I, Mattila IP, Saramäki J, et al. Human thymic T cell repertoire is imprinted with strong convergence to shared sequences. Mol Immunol. 2020;127:112–23.

    Article  Google Scholar 

  10. LMOD Bruin, Bosticardo M, Barbieri A, Lin SG, Rowe JH, Poliani PL, et al. Hypomorphic Rag1 mutations alter the preimmune repertoire at early stages of lymphoid development. Blood. 2018;132:281–92.

    Google Scholar 

  11. Pannetier C, Cochet M, Darche S, Casrouge A, Zoller M, Kourilsky P. The sizes of the CDR3 hypervariable regions of the murine T-cell receptor beta chains vary as a function of the recombined germ-line segments. Proc Natl Acad Sci USA. 1993;90:4319–23.

    Article  CAS  Google Scholar 

  12. Funck T, Barnkob MB, Holm N, Ohm-Laursen L, Mehlum CS, Möller S, et al. Nucleotide composition of human Ig nontemplated regions depends on trimming of the flanking gene segments, and terminal deoxynucleotidyl transferase favors adding cytosine, not guanosine, in most VDJ rearrangements. J Immunol. 2018;201:1765–74.

    Article  CAS  Google Scholar 

  13. Roldan EQ, Sottini A, Bettinardi A, Albertini A, Imberti L, Primi D. Different TCRBV genes generate biased patterns of V-D-J diversity in human T cells. Immunogenetics. 1995;41:91–100.

    Article  CAS  Google Scholar 

  14. Srivastava SK, Robins HS Palindromic nucleotide analysis in human T cell receptor rearrangements. PLOS ONE. 2012; 7: e52250.

  15. Robins HS, Campregher PV, Srivastava SK, Wacher A, Turtle CJ, Kahsai O, et al. Comprehensive assessment of T-cell receptor β-chain diversity in αβ T cells. Blood 2009;114:4099–107.

    Article  CAS  Google Scholar 

  16. Sherwood AM, Desmarais C, Livingston RJ, Andriesen J, Haussler M, Carlson CS, et al. Deep sequencing of the human TCRγ and TCRβ repertoires suggests that TCRβ rearranges after αβ and γδ T cell commitment. Sci Transl Med. 2011; 3: 90ra61-90ra61.

  17. Ostmeyer J, Christley S, Cowell L Dynamic kernel matching for non-conforming data: a case study of T-cell receptor datasets. arXiv.

  18. Kontschieder P, Fiterau M, Criminisi A, Bulò SR Deep neural decision forests. In IJCAI'16 Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence; 2016. p. 4190-4.

  19. Naparstek Y, Holoshitz J, Eisenstein S, Reshef T, Rappaport S, Chemke J, et al. Effector T lymphocyte line cells migrate to the thymus and persist there. Nature 1982;300:262–4.

    Article  CAS  Google Scholar 

  20. Naparstek Y, Ben-Nun A, Holoshitz J, Reshef T, Frenkel A, Rosenberg M. et al. T lymphocyte lines producing or vaccinating against autoimmune encephalomyelitis (EAE). Funct activation induces peanut agglutinin receptors Accumul brain thymus line cells. Eur J Immunol. 1983;13:418–23.

    Article  CAS  Google Scholar 

  21. Michie SA, Kirkpatrick EA, Rouse RV. Rare peripheral T cells migrate to and persist in normal mouse thymus. J Exp Med. 1988;168:1929–34.

    Article  CAS  Google Scholar 

  22. Atchley WR, Zhao J, Fernandes AD, Drüke T. Solving the protein sequence metric problem. Proc Natl Acad Sci USA. 2005;102:6395–6400.

    Article  CAS  Google Scholar 

  23. Ostmeyer J, Christley S, Rounds WH, Toby I, Greenberg BM, Monson NL, et al. Statistical classifiers for diagnosing disease from immune repertoires: a case study using multiple sclerosis. BMC Bioinforma 2017;18:401–401.

    Article  Google Scholar 

  24. Ostmeyer J, Christley S, Toby IT, Cowell LG. Biophysicochemical motifs in T-cell receptor sequences distinguish repertoires from tumor-infiltrating lymphocyte and adjacent healthy tissue. Cancer Res. 2019;79:1671–80.

    Article  CAS  Google Scholar 

  25. Ostmeyer J, Lucas E, Christley S, Lea J, Monson N, Tiro J, et al. Biophysicochemical motifs in T cell receptor sequences as a potential biomarker for high-grade serous ovarian carcinoma. PLOS ONE. 2020; 15: e0229569.

  26. Christley S, Ostmeyer J, Quirk L, Zhang W, Monson N, Sirak B, et al. T cell receptor repertoires acquired via routine pap testing may help refine cervical cancer and precancer risk estimates. Front Immunol 2021;12:937.

    Article  Google Scholar 

  27. Glorot X, Bengio Y Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics; 2010. p. 249-56.

  28. Kingma DP, Ba JL Adam: A Method for Stochastic Optimization. In ICLR 2015: International Conference on Learning Representations 2015; 2015.

  29. Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw. 1994;5:157–66.

    Article  CAS  Google Scholar 

  30. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80.

    Article  CAS  Google Scholar 

Download references


JO is grateful to the department of Population & Data Sciences and the University of Texas Southwestern Medical Center for the salary support he received for this study.


JO used his protected time from the department of Population & Data Sciences to conduct this study. LC and Sc may have been supported in part by the US National Institute of Allergy and Infectious Diseases (NIAID) (R01AI097403) and the EU Framework Programme for Research and Innovation (825821).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jared Ostmeyer.

Ethics declarations


A provisional patent has been filed based on this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ostmeyer, J., Cowell, L., Greenberg, B. et al. Reconstituting T cell receptor selection in-silico. Genes Immun 22, 187–193 (2021).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links