Introduction

Adaptive T cell immunity depends on a pool of diverse T cell receptors (TCRs) that enable the host to mount specific T cell responses against an enormous array of antigenic peptides presented by class I and class II major histocompatibility complex (MHC) molecules1. Antigen-specific T cell responses are characterized by cells expressing biased profiles of T cell receptors that are selected from a diverse, naive repertoire. In most T cell responses, the TCR repertoires responding to a particular antigenic epitope are distinct between individuals. The immune response to a specific epitope involving predominantly T cells bearing TCRs that are rarely observed in multiple individuals is thus called private T cell response. In contrast, some other antigen-specific TCR repertoires consist of TCRs that are frequently observed in multiple individuals (public T cell response). Although it is often seen as an unusual phenomenon, public TCRs have been described in a variety of immune responses, including infectious diseases, malignancy and autoimmunity (Table 1 and25).

Table 1 Examples of public TCRs in humans

The first observation of public TCR came from a study of HLA-B*0801-restricted CD8+ T cell clones specific for the EBV EBNA-3A339−347 peptide, wherein the shared TCR expressed a residue-identical TRBV7-6/TRBJ2-7/TRAV26-2/TRAJ52 among four randomly selected individuals2. Since then, many observations of public TCRs in a variety of infectious diseases (Table 1), including human cytomegalovirus3,4, parvovirus B195, Clostridium tetani6, Herpes simplex virus7, and HIV8,9,10, have been reported. The involvement of public TCRs in malignancy was also observed in tumor-associated antigen-specific T cells from melanoma11,12,13,14,15, synovial sarcoma and prostate cancer16,17 (Table 1). Public TCRs also occurred in autoimmune diseases such as multiple sclerosis18, reactive arthritis9, aplastic anemia20, psoriasis vulgaris21, systemic sclerosis22, sarcoidosis23, and rheumatoid arthritis24 (Table 1). In addition, examples of public TCRs were extensively observed in non-human primates and mice25. Notably, public TCRs were shown to lead to favorable biological outcomes in acute SIV infection26. Studies of HIV-infected individuals with a long-term non-progressive disease have also revealed shared TCRs that display effective cross-recognition of epitope variants9,27,28,29. However, public TCR usage among individuals has also been reported to facilitate viral immune escape30. Therefore, although public TCR is widespread within pathogen-specific T cell response, its relative benefits and drawbacks are yet to be fully defined25. Given the frequent occurrences of public TCRs in those immune responses, understanding the cause and the role of public T cell responses can be useful for the development of vaccines of infectious disease, and perhaps even therapeutic intervention for autoimmune and malignant diseases25.

The prerequisite for public T cell response is the sharing of TCRs in naïve T cell repertoire among different individuals. Indeed, a large degree of overlap has been observed between the naïve TCR repertoires in inbred mice31,32 and humans33,34. This phenomenon of TCR sharing within the naïve T-cell pool of multiple individuals provides the molecular basis for public T cell responses, enabling epitope-specific clonotype selection based on optimal TCR recognition operating on a partially common platform35,36,37. In the following sections, we discuss the determinants of the overlap of naïve TCR repertoire, which lays the foundation for public T cell response.

Public T cell responses rely on shared TCRs generated in initial recombination

Public T cell responses depend on mature naïve T cells from different individuals that bear the same TCRs. These T cells could be favorably selected during T-cell development, commonly produced during initial recombination, or both. Several mechanisms have been proposed to generate public T cell responses, including a structure-based interaction between TCR and pMHC35,36 and biases during thymic selection. Since there won't be any public T cell response if no TCRs are shared among individuals, identical TCRs must be generated during initial recombination. Indeed, studies have shown extensive overlaps in TCR repertoires of CD4+CD8+ (DP) thymocytes and naïve T cells. Because the characteristics of the TCR repertoires in DP thymocytes and naïve T cells are very similar, thymic selection seems to play a minor role in determining the shared TCRs among individuals; thus the common TCRs provided for public T cell responses rely mainly on initial V(D)J recombination. Despite being considered as a rather random process, which could make TCR sharing impossible among individuals, V(D)J recombination must possess a large measure of constraints in order to exhibit common TCR sharing.

How does initial V(D)J recombination determine TCR sharing?

The available data suggest that convergent recombination37,38,39,40 and biases during recombination33,37,41 are the major contributors of TCR sharing in TCR repertoires among individuals. Convergent recombination is the process whereby multiple recombination events 'converge' to produce the same nucleotide sequence and multiple nucleotide sequences “converge” to encode the same amino-acid sequence (Figure 1), which results in different TCR sequences to be generated with differential frequencies during recombination37,38,39,40. Recombinatorial biases include biased V/D/J gene usage and combination, bias in the number of nucleotide deletions at the coding ends of V/D/J gene segments, bias in the number of nucleotide additions and bias in base usage at the V-D/D-J junctions33,37,41,42,43. How those two determinants generate the substantial sharing of TCRs among individuals during initial recombination is discussed below.

Figure 1
figure 1

The process of convergent recombination proposed by Venturi et al.38. Convergent recombination is illustrated for the amino-acid sequence SSLGAE within Vβ12-1-Jβ2-3 combination. (A) Gene segments used for the mouse TCR β-chain. (B) Multiple recombination mechanisms (involving different contributions from the germline genes and nucleotide additions) can produce the same nucleotide sequence agc tct ctg ggt gca gaa. Possible alignments with Vβ12-1 (blue), Dβ1/Dβ2 (red), and Jβ2-3 (green) gene segments involving different numbers of nucleotide addition (black) are shown. (C) Twelve unique nucleotide sequences can encode an identical amino-acid sequence SSLGAE.

Convergent recombination

“Convergent recombination” was first proposed as a mechanism that drives the sharing of antigen-specific TCR between multiple individual mice through statistical correlation studies in 2006, wherein 3 400 TCRβ chains from inbred mice CD8+ T cells responding to the influenza A virus D(b)NP(366) and D(b)PA(224) epitopes were analyzed. The authors found that the sharing of both the TCRβ amino-acid and TCRβ nucleotide sequences was negatively correlated with the prevalence of random nucleotide additions in the sequence. However, the extent of TCRβ amino-acid sequence sharing among mice was shown to be strongly correlated with the level of diversity in the encoding nucleotide sequences, suggesting that a key feature of shared TCRs is that they can be made in a variety of ways. Through computer simulation, the authors estimated the relative production frequencies and varieties of production mechanisms for TCRβ sequences and found strong correlations with the sharing of both TCRβ amino-acid sequences and TCRβ nucleotide sequences38. The same group further confirmed the role of convergent recombination in driving the sharing of TCR sequences in outbred macaques39 and humans40. By analyzing 6 000 TCRβ sequences that are specific for the immunodominant Mamu-A*01-restricted Tat-SL8/TL8 and Gag-CM9 epitopes of SIV in 20 outbred rhesus macaques, they observed that the spectrum of TCRβ sharing was negatively correlated with the minimum number of nucleotide additions required to produce the sequences and strongly positively correlated with the number of observed nucleotide sequences encoding the amino-acid sequences. TCRβ sharing was also correlated with the number of times and the variety of different ways that the sequences were produced in silico via random gene recombination39. Analyses on 2 836 TCRβ sequences from 23 CMV-infected and 10 EBV-infected individuals yielded similar results40.

Because convergent recombination predicts that different TCR sequences have differential production frequencies, the clonotypic frequencies of different TCRs are thus quite varying. Indeed, this prediction was borne out by a recent study on the naive CD8+ TCRβ repertoire in mice, showing that TCRβ sequences with convergent features were present at higher copy numbers within individual mice and also shared between individual mice. Thus, the clonotypic landscape of naive CD8+ T cell repertoire is largely determined by convergent recombination. Similar results in humans confirmed that convergent recombination shapes the clonotypic landscape in TCR repertoire of the memory and naive T cell pools, as well as their interrelationship within and between individuals34. The role of convergent recombination in shaping the intra-individual TCRβ clonotypic landscape and driving the inter-individual TCRβ sharing was also demonstrated in DP thymocytes prior to MHC-mediated thymic selection (our unpublished data). It must be noted that a random convergent recombination process is an insufficient cause of the large overlap observed in DP TCRβ repertoire, indicating involvement of other mechanisms.

Recombinatorial biases

Although convergent recombination yields a statistically significant prediction about the extent of sharing of TCR sequences based on an unbiased, random recombination process, less than half of the overlap of DP TCRβ nucleotide sequence repertoires could be attributed to random convergent recombination (our unpublished data). Furthermore, there are TCR sequences that are most likely to be produced during random convergent recombination, but are present at lower clonotype frequencies and only shared by fewer individuals32,38,39,40 (and our unpublished data), indicating preferences during recombination. Indeed, biases during recombination have been reported by many studies. Recombinatorial biases should contribute to the overlap of naïve TCR repertoire by preferentially generating a common subset of TCR sequences among individuals.

Preferences in the usage frequency and pairing of different V/D/J gene segments during TCR rearrangement have been observed extensively. Analyses on TCRβ sequences from several variable genes in human lymphocytes revealed skewed patterns of Vβ, Dβ, and Jβ region usage44. It has also been found that Jβ usage is not random in human Vβ17 T cell repertoire prior to thymic selection43. Preferential pairing between Vβ genes, Dβ genes, and Jβ genes has also been shown45,46. Although biases observed in the post-selection repertoire might be undermined by thymic selection, most of the biases should represent preferences during initial recombination, which are maintained during intra-thymic selection (as discussed below). Indeed, a study on TCRα chains in human T cells demonstrated that the Vα-Jα recombination in the thymus is not random. The TCRα chain diversity in peripheral T lymphocytes mimics the same general patterns of rearrangement as observed in the thymus, and these patterns appear to be conserved among different individuals47. In mice, it was also found that T-cell receptor Dβ and Jβ gene segment usage is not random, but patterned at the time of recombination. Notably, the relative frequency of gene segment usage established during recombination is very similar to that found after thymic selection46. Moreover, biased Vβ usage by human CD4+ and CD8+ T cells in neonatal and adult donors is highly correlated between unrelated individuals, and the correlation in biased Vβ expression patterns between CD4+ and CD8+ T cells can be dominantly determined by germline TCRβ locus factors rather than thymic selection48. Other observed recombinatorial biases include the extent of the removal of nucleotides from the germline gene segments and additions of specific 'random' nucleotides. For example, there are differences between the various V and J genes in the numbers of nucleotides removed from the 3′ end of the V gene segments and the 5′ end of the J gene segments and base usage frequency at the N-addition is not random42,43 (and our unpublished data).

Detailed analyses on recombinatorial biases were facilitated by recent high-throughput sequencings33,49,50,51,52 (and our unpublished data), which enable comparison between the empirical TCRβ repertoires and the simulated model being made, so that biases during recombination could be revealed. A simulated TCRβ repertoire should incorporate the effect of random convergent recombination, which assumes random nucleotide deletion at the coding ends of those germline segments, and random nucleotide addition at the junctions within different Vβ-Jβ combination. Figure 2A and 2B show the pattern of nucleotide deletions at the coding ends differing between the empirical and simulated repertoires of DP thymocytes. A skew toward a longer length was also observed for Dβ segment in the empirical repertoire compared to the simulated repertoire after recombination (Figure 2C). Base usage in the simulated repertoire at the junctions was dissimilar to that of the empirical repertoire, with base C occurring at higher frequencies at the Vβ-Dβ junction (Figure 2D) and base G at the Dβ-Jβ junction (Figure 2E) in the experimental repertoire. Furthermore, different Vβ and Jβ segments presenting different patterns of nucleotide deletion at the coding ends were also observed (our unpublished data), confirming a previous study showing that nucleotide deletion is influenced by base composition at the coding ends42. In addition, Vβ-Jβ and Dβ-Jβ combination usage in DP repertoire was not random (our unpublished data). Overall, it is clear that TCR manufacture is not random. Biases in TCR gene usage and association, splicing, and terminal deoxynucleotidyl transferase activity all appear to combine and yield identical TCR structures within the naïve TCR repertoires of different individuals25.

Figure 2
figure 2

Recombinatorial biases exemplified with TCR β nucleotide sequences within Vβ1-Jβ1-1 combination. Features of functional nucleotide sequences observed empirically or generated by simulation were compared. Features of the simulated repertoire are the expected values for a repertoire that is generated through a random convergent recombination process. Features of the empirical repertoire are shown for three individual mice. (A) Frequency distribution of simulated and empirical repertoires as a function of the number of nucleotide deletions at the 3′ end of the Vβ segment. (B) Frequency distribution of simulated and empirical repertoires as a function of the number of nucleotide deletions at the 5′ end of the Jβ segment. (C) Frequency distribution of simulated and empirical repertoires as a function of Dβ segment length after recombination. (D and E) Base usage of simulated and empirical repertoires at the Vβ-Dβ junction (D) or at the Dβ-Jβ junction (E). The error bars indicate SD. Correlations are based on Pearson's correlation coefficient.

The role of thymic selection in TCR sharing

During intra-thymic development, immature thymocytes are educated before migrating into the periphery and becoming naïve T cells. Only about 3% of thymocytes are positively selected and survive thymic selection, while the rest are eliminated through negative selection or death by neglect53. The number of unique TCRαβ pairs in naïve T cells is thus markedly reduced to about 2 × 106 in mice54 or 2 × 107 in humans55. Although thymic selection can dramatically limit the diversity of TCR repertoire, its contribution to TCR sharing in naïve T cells depends on whether there is a common subset of TCR sequences that are preferentially and positively selected among different individuals, which is called “convergent evolution”33.

Preferences in thymic selection have been reported by multiple studies. Skewed Jβ usage between thymic CD4+CD8+ (DP) and lymph node CD4+ or CD8+ T cells43 and a slight shortening of CDR3 lengths during the transition from DP stage to CD4+ or CD8+ single positive stage43,56,57,58 have been reported. One study utilized transgenic mice expressing a genomic TCR Vα locus consisting of only a single Vα gene segment and a few Jα gene segments. The analysis of pre-selection DP thymocytes from this mouse showed a diverse array of TCR CDR3α sequences, while thymic selection produced a post-selection repertoire with marked overrepresentation of a subset of sequences, indicating that DP cells expressing particular CDR3α sequences might have quite different probabilities of being selected59. But this suggestion is challenged by the facts that the sequencing information of these studies is not sufficient to observe the true extent of clonotypic frequency differences within the pre-selection repertoire32, and that the hierarchy of clonotypic frequency is preserved during intra-thymic development (see discussion above).

It was reported that MHC class I- and class II-restricted TCRs can be distinguished by minute, single-residue changes in CDR3α, reflecting the positive selection of preferential TCR contacts for the recognition of MHC class I or class II molecules, respectively59. Structural studies also indicate that germline TCR V regions might have an inherent propensity to recognize conserved features found in the MHC α-helices, which could result in the preferential expression of certain V regions by CD4+ (MHC-class-II-restricted) or CD8+ (MHC-class-I-restricted) T cells60. Although there are indeed a few examples of TCR V region alleles or family members with a bias toward a particular MHC allele or class, in general, most Vα and Vβ elements can be found in TCRs that recognize any of the extremely polymorphic alleles and isotypes of MHCI and MHCII60. Therefore, it seems that positive selection in the thymus must choose receptors that can react with MHC from an immense collection of receptors with a large degree of randomness.

Despite that thymic selection might influence TCR sharing by both limiting (negative selection) and shaping (positive selection of preferred MHC-TCR-V-region interactions) the naive TCR repertoire, recent studies from our lab and others strongly suggest that the role of thymic selection in TCR sharing is minor. High-throughput DNA sequencing revealed that the overlap in the naive CD8+ TCRβ sequence repertoires of any two of the individuals appears to be independent of the degree of human leukocyte antigen matching33, and TCRβ repertoire of murine DP thymocytes has almost the same recombination features as those in the naïve TCRβ repertoire (our unpublished data), indicating that thymic selection does not preferentially select for particular TCR sequences. Unable to encode functional TCRβ chains, non-functional TCRβ nucleotide sequences are not subject to thymic selection and thus should preserve initial recombination patterns. Deep sequencing analysis of the TCRβ repertoire of murine DP thymocytes revealed a highly similar usage of Vβ-Jβ combinations between functional TCRβ nucleotide sequences and non-functional TCRβ nucleotide sequences. Similar usage of Vβ segments from DN3 (DN: CD4-CD8- double negative) thymocytes through DN4 and DP thymocytes were also observed61,62. All these evidence strongly suggests that β-selection and TCRαβ heterodimer formation do not favor any particular Vβ-Jβ combinations. In addition, biased Vβ usage by human CD4+ and CD8+ T cells in neonatal and adult donors is highly correlated between unrelated individuals, and the correlation in biased Vβ expression patterns between CD4+ and CD8+ T cells can be explained by germline TCRβ locus factors, but not TCRβ allelic or HLA effects63.

Detailed analysis of available sequences from DP TCRβ repertoire show that those functional TCRβ nucleotide sequences and non-functional TCRβ nucleotide sequences are highly similar in terms of nucleotide deletions at the coding ends of the Vβ and Jβ segments, nucleotide additions and base usage at the Vβ-Dβ/Dβ-Jβ junctions, and the length of rearranged Dβ segment after recombination (our unpublished data). Similarities as such strongly argue against selection for particular CDR3 sequences. A comparison of the DP TCRβ repertoire with naïve TCRβ repertoire demonstrated that recombination features of DP TCRβ repertoire were maintained during thymic selection to the naïve TCRβ repertoire (our unpublished data), suggesting that the influence of MHC-mediated selection is minimal. Furthermore, Vβ-Jβ combination usage by TCRβ functional nucleotide sequences in human naïve repertoire was similar to that of non-functional TCRβ nucleotide sequences33, and β-chains were positively selected with similar efficiency regardless of CDR3 loop sequences64. Considering the effects of convergent recombination in shaping the intra-individual clonotypic landscape of TCRβ sequences in the naïve repertoire (as discussed above)32, it seems very likely that initial recombination patterns are preserved during intra-thymic development with no preferential selection for particular TCR sequences and thus thymic selection (convergent evolution) unlikely contributes much to the inter-individual overalp of naïve TCR repertoire.

Future challenges

Studies to date highly suggested that the substantial sharing of TCRs among individuals is mainly determined by V(D)J recombination through convergent recombination and recombinatorial biases. What is intriguing is that V(D)J recombination is not a totally random process, which can generate a more diverse repertoire within an individual. Such a strategy would allow for massive TCR diversity across a species group, thus benefiting the population as a whole. Could those recombinatorial biases result from natural selection involving co-evolution of host and pathogens? What is the biological utility of those recombinatorial biases, and can recombinatorial biases be manipulated to the benefit of human beings?

V(D)J recombination has been shown to be regulated at multiple levels. Apart from cis-elements in the immune receptor loci, including recombination signal sequence, enhancers and promoters48,65, some trans-elements have been shown to play an important part in the regulation of V(D)J recombination66,67,68. Moreover, accumulating evidence has demonstrated the role of epigenetic factors in the regulation of V(D)J recombination, probably by altering the chromatin accessibility at the immune receptor loci66,69,70,71,72,73,74,75. Future investigations into the upstream signals that regulate those known downstream regulators of V(D)J recombination should be able to provide insights into how the V(D)J recombination process can be manipulated.

A fundamental question in studying the regulation of V(D)J recombination is whether V(D)J recombination is a genetically programmed process that is inert to peripheral immune stresses, or regulated responsively to the immune state of the host. “Adaptive mutation”, a process in which organisms adaptively change their genetic information to facilitate their adaptation to the stressful environments, has been well recognized76,77,78,79,80. Since V(D)J recombination generates a diverse immune receptor repertoire to specifically combat the invading antigen, recombinatorial biases could be influenced by immune stresses and have evolved to better fight against common infections. On the other hand, public TCRs limit the diversity of TCRs, and this could make a population more vulnerable to rare pathogens. It has been observed that public TCRs appear less frequently in tumor-associated TCR repertoire compared to pathogen-specific TCR repertoire25. Although the reason for this discrepancy remains unknown81, one could speculate that the presence of less anti-tumor public TCRs enables cancer to escape immune surveillance more easily. It is clear that private TCR repertoire plays a very important role in fighting many diseases in each individual, including cancer. While private TCRs may render many in a population to succumb to a new pathogen, at least some individuals will be able to develop an adequate immune response to win over the pathogen. Thus, it would be beneficial to everyone in the population if there was a way to convert a private TCR response into a public one.

Effects of the composition of TCR repertoire on disease pathogenesis have been reported. The autoimmunity of non-obese diabetic mice was linked to the selection of a low-diversity repertoire of natural regulatory CD4 T cells82. Decreasing repertoire diversity has been implicated in the age-associated decline in CD8 T cell immunity83. Future repertoire-wide studies into the causal relationship between TCR repertoire composition and disease pathogenesis are important as they could provide clues for applying public TCR responses in preventing and treating human diseases.

Although recombinatorial biases and convergent recombination are two major determinants that are accountable for the overlap of naïve TCR repertoire, much is to be learned about the underlying mechanisms and biological relevance of recombinatorial biases. The effects of TCR sharing on both viral escape and disease should be a future hotspot. Future investigation should be aimed at better understanding the role of the TCR repertoire in immune responses. Ideally, we would be able to predict and manipulate the TCR repertoire to the benefit of human health.