The commitment of a developing thymocyte to the CD4+ helper or CD8+ cytotoxic T cell fate provides an important model for understanding cell fate decisions. The ultimate fate of a thymocyte is determined by the specificity of its T cell receptor (TCR) for major histocompatibility complex (MHC) molecules during positive selection in the thymus, with recognition of major histocompatibility complex class I (MHCI) leading to the CD8+ T cell fate and recognition of MHC class II (MHCII) leading to the CD4+ T cell fate. CD8 and CD4 are coreceptors for MHCI and MHCII, respectively, and their expression pattern has an important role in lineage commitment1,2,3,4. The ‘kinetic signaling’ model of lineage commitment focuses on TCR signal termination in CD8-fated cells in directing the lineage choice3. However, there is evidence that TCR signaling impacts thymocyte development throughout the >2-day process of positive selection5,6,7,8,9, and a clear, quantitative picture of the temporal pattern of TCR signaling throughout lineage specification has not emerged yet. It is also unknown whether different transcriptional targets of the TCR pathway are activated in a temporal and/or lineage-specific manner. Thus, the molecular links between TCR signaling and induction of the lineage-defining transcription factors THPOK (encoded by Zbtb7b in mice) and RUNX3 in mature CD4+ and CD8+ T cells, respectively10 remain unknown.

One complicating factor in addressing these questions is the diversity in TCR specificity and resulting cell fates, with many cells undergoing death by neglect, negative selection or agonist selection, alongside CD4-fated and CD8-fated cells. Even in mice bearing fixed rearranged TCRs (TCR transgenic (TCRtg)) that lead to a predetermined lineage choice, defining cell states and ordering them into a developmental trajectory remains a challenge. Traditionally, cell states have been characterized using flow cytometry to quantify cell surface markers1. Defining immature CD4+CD8+ (double positive (DP)) versus mature CD4+CD8 or CD4CD8+ (single positive (SP)) is relatively straightforward; however, further subdivisions can be complicated by the subjectivity of manual gating and limited number of markers. This approach also lacks the ability to make quantitative global comparisons in gene expression between cell states.

Advances in single-cell RNA sequencing (scRNA-seq) technologies have enabled the unbiased observation of transcriptional heterogeneity in the mammalian thymus11,12,13,14,15. Although these studies provided important insights, they did not have sufficient temporal resolution of the CD4 versus CD8 commitment process to connect TCR signaling events to the induction of THPOK, RUNX3 and the initiation of lineage-specific transcriptional programs. A high-resolution delineation of the differentiation process that provides connections to flow cytometry-based studies is needed to inform models of lineage commitment and to identify early drivers of lineage divergence.

To address these challenges, we leveraged CITE-seq16 and totalVI17 to build a high-resolution timeline of RNA and protein expression changes during positive selection. We identified two temporally distinct waves of TCR signaling: an early wave that is more sustained in CD4-fated cells, and a later wave that is specific to CD8-fated cells and that overlaps with CD8+ T cell lineage specification. We find that CD8-fated cells initially undergo a parallel, but transient, CD4 transcriptional program, implying that CD8-fated cells audition for the CD4+ T cell fate before undergoing CD8+ T cell lineage specification. We also identify TCR signaling through calcineurin–NFAT as a driver of the CD4+ T cell fate. These data provide an important resource for understanding T cell fate commitment in the thymus.


CITE-seq and totalVI form an RNA–protein thymocyte atlas

We profiled thymocytes from wild-type C57BL/6 mice (referred to as B6), CD4+ T cell lineage-restricted mice (hereafter CD4-fated), including AND and OT-II TCRtg mice, which express TCRs specific for MHCII, and B2m−/− (referred to as MHCI−/−), which have diverse TCR repertoires, and CD8+ T cell lineage-restricted mice (hereafter CD8-fated), including F5 and OT-I TCRtg mice, which express TCRs specific for MHCI, and H2-Ab1−/− (referred to as MHCII−/−), which have diverse TCRs. We measured transcriptomic and surface protein composition at the single-cell level using CITE-seq16 with a panel of 111 antibodies (Supplementary Data 1), which we jointly analyzed using totalVI17. We analyzed thymi from two mice per lineage-restricted genotype and five wild-type mice (Supplementary Data 2). Samples from the MHC-deficient mice and three of the five wild-type mice were sorted to enrich for CD5+TCRβ+ thymocytes undergoing positive selection (Extended Data Fig. 1a). totalVI integration of CITE-seq data (72,042 cells) stratified cells based on RNA and protein information (Fig. 1a) and identified thymocytes across developmental stages including CD4CD8 (double negative), proliferating DP, quiescent pre-selection DP, post-TCR-recombination DP receiving positive selection signals (DP (Sig.)), and immature and mature CD4+ and CD8+ T cells, along with negatively selecting and agonist-selecting populations (Fig. 1a). Wild-type, CD4-fated and CD8-fated samples were well mixed at early developmental stages, up to DP (Sig.) but branched into CD4+ and CD8+ T cell lineages in later-stage populations (Fig. 1b). We characterized cell populations with traditional markers (Extended Data Fig. 1b,c) and with unbiased totalVI differential expression tests (Fig. 1c,d and Supplementary Data 3). Top differentially expressed features included lineage surface markers (CD4, CD8), transcription factors (Foxp3, Zbtb7b) and markers of maturation stage (Rag1, Ccr4, S1pr1) (Fig. 1c,d). These multiomic definitions indicated continuous expression changes, particularly between the DP and SP stages (Fig. 1c,d), motivating analysis as a continuous developmental process.

Fig. 1: A joint transcriptomic and surface protein atlas of thymocyte development in wild-type and lineage-restricted mice.
figure 1

a,b, UMAP51 plots of the totalVI latent space from CITE-seq on total thymocytes from wild-type (WT), CD4-fated (MHCI−/−, OT-II and AND) and CD8-fated (MHCII−/−, OT-I and F5) mice, labeled by cell-type annotation (a) and mouse genotype (b). c,d, Heatmaps of markers derived from totalVI one-vs-all differential expression test between cell types for RNA (c) and proteins (d) in thymocytes from mice as in a. Values are totalVI denoised expression. e, UMAP plots of the totalVI latent space from positively selected thymocytes with cells labeled by mouse genotype. f, UMAP plots of the totalVI latent space from positively selected thymocytes. Cells colored by totalVI denoised expression of protein markers of lineage (CD4, CD8a), TCR signaling (CD5, CD69) and maturation (CD24, CD62L). g, Cells colored by totalVI denoised expression of RNA markers of TCR recombination (Rag1), thymic location (Cxcr4, Ccr7), and lineage regulation (Gata3, Zbtb7b, Runx3) as in f. GD T cells, gamma-delta T cells; DN, double negative; DP (P), double positive proliferating; DP (Q1), DP quiescent 1; DP (Q2), DP quiescent 2; DP (Sig.), DP signaled; Neg. sel. (1), negative selection 1; Neg. sel. (2), negative selection 2; Treg cell, regulatory CD4+ T cell; NKT cell, natural killer T cell.

We focused further analysis on positively selecting thymocytes from DP (Sig.) through mature stages (Fig. 1a). The totalVI latent space derived from these populations stratified thymocytes by developmental stage and CD4–CD8 lineage (Fig. 1e and Extended Data Fig. 1d). totalVI denoised expression of the proteins CD4 and CD8, markers of positive selection-induced TCR signaling (CD5, CD69) and maturation stage (CD24, CD62L), as well as RNA markers of TCR recombination (Rag1), location within the thymus (Cxcr4, Ccr7) and lineage regulation (Gata3, Zbtb7b, Runx3) showed the expected developmental and lineage-specific patterns18 (Fig. 1f,g). Thus, our data enabled a high-resolution analysis of the continuous developmental processes between the DP and SP stages.

Pseudotime clarifies intermediate developmental stages

Next, we performed pseudotime inference with Slingshot19,20 on the joint RNA-protein reduced dimension space to delineate the expression changes throughout positive selection along a branching trajectory (Fig. 2a, Extended Data Fig. 2a and Supplementary Data 4). The inferred pseudotime captured the timing of known expression changes18, such as early downregulation of Rag1, continuous downregulation of the early markers Ccr9 and Cd24a/CD24, transient expression of Cd69/CD69 and late upregulation of the maturation markers Klf2, S1pr1 and Sell/CD62L (Fig. 2b,c). To explore beyond known markers, we performed totalVI differential expression tests over pseudotime, and created a comprehensive timeline of RNA and protein expression changes for each lineage (Fig. 2d, Supplementary Data 5 and 6 and Supplementary Information). The lineages differed in their expression of key molecules, such as coreceptors and transcription factors (Fig. 2b), as expected, whereas markers of maturation followed similar trajectories in both lineages (Fig. 2c). Most significant differences over time were common across lineages (Fig. 2d), which enabled the investigation of lineage-specific differences at comparable developmental stages.

Fig. 2: Pseudotime inference captures continuous maturation trajectory and clarifies intermediate thymocyte stages.
figure 2

a, UMAP plot of the totalVI latent space from positively selected thymocytes with cells colored by Slingshot pseudotime and smoothed curves representing the CD4+ and CD8+ T cell lineages. b, Heatmap of RNA (top) and protein (bottom) markers of thymocyte development over pseudotime in the CD4+ and CD8+ T cell lineages. Features are colored by totalVI denoised expression, scaled per row, and sorted by peak expression in the CD4+ T cell lineage. Pseudotime axis is the same as in a. c, Expression of features in the CD4+ and CD8+ T cell lineages that vary over pseudotime. Features are totalVI denoised expression values scaled per feature and smoothed by loess curves. d, Heatmap of all RNA differentially expressed over pseudotime in any lineage. Features are scaled and ordered as in b. Labeled genes are highly differentially expressed over time (Methods). e, In silico flow cytometry plots of log(totalVI denoised expression) of CD8a and CD4 from positively selected thymocytes (left) and the same cells separated by lineage (right). Cells are colored by pseudotime. Gates were determined based on contours of cell density. WT, wild-type. f, In silico flow cytometry plot of data as in e separated by lineage and pseudotime. g, UMAP plot of the totalVI latent space from positively selected thymocytes with cells colored by gate. Cells were computationally grouped into eight gates using CD4, CD8a, CD69, CD127(IL-7Ra), and TCRβ. h, Histograms of cells separated by lineage and gate with cells colored by gate as in g. i, Stacked histograms of gated populations in CD4-fated (top) and CD8-fated (bottom) thymocytes, with thresholds classifying gated populations over pseudotime (Methods). j, Schematic timeline aligns pseudotime with gated populations, with population timing determined as in i.

Owing to the continuous nature of development and technical variations in marker detection, no consensus has emerged on how to define positive selection intermediates by flow cytometry1,2,21,22. To address this, we performed in silico flow cytometry on totalVI denoised surface protein expression to distinguish different pseudotime phases. CD4-fated cells progressed continuously in pseudotime from DP to CD4+CD8lo to CD4+CD8- (Fig. 2e,f and Extended Data Fig. 2b), whereas CD8-fated cells progressed from DP to CD4+CD8lo before reversing course to reach CD4CD8+23,24,25. Specifically, although at pseudotime 6–8 nearly all CD8-fated cells were CD4+CD8lo, at pseudotime 8–12, CD8-fated, but not CD4-fated, thymocytes passed through a DP phase again (referred to as DP3) (Fig. 2f). Although a later-time MHCI-specific DP3 population has been reported21, it is not commonly accounted for11,14,15, resulting in the contamination of the DP gate with later-time CD8-fated cells.

We used a data-driven strategy to identify a minimal set of surface markers capturing intermediate stages of positive selection and to better characterize the DP3 stage. Four pseudotime stages were separated by in silico gating on CD69 and CD127(IL-7Ra) independent of lineage: CD69loCD127lo, CD69+CD127lo, CD69+CD127+ and CD69CD127+ (Extended Data Fig. 2c,d). Addition of CD4 and CD8 markers enabled distinction between the lineages at later times (Fig. 2g,h). DP3 had high expression of TCRβ21,26 and was CD69+CD127+ (Extended Data Fig. 2d,e), compared to earlier CD69CD127 DP1 and CD69+CD127 DP2. In combination, a gating scheme using CD4, CD8, TCRβ, CD69 and CD127 resolved eight populations (DP1, DP2, CD4+CD8lo, DP3, semimature CD4 (CD4 SM) and mature CD4 (which combined corresponded to CD4 SP), and semimature CD8 (CD8 SM) and mature CD8 (which combined corresponded to CD8 SP)) (Fig. 2g,h) and could enable the approximation of this separation by time and lineage using flow cytometry (Fig. 2i,j). Fluorescence-based flow cytometry replicated these eight CITE-seq-derived gates, supporting the presence of the proposed intermediate stages (Extended Data Fig. 2f–h). Collectively, these observations specified an updated model of positive selection intermediates in the CD4+ and CD8+ T cell lineages.

CITE-seq reveals order of key lineage commitment events

Prolonged TCR signaling is known as a driver of CD4+ T cell lineage commitment, whereas the role of TCR signals in CD8 SP development remains controversial1,2,3. To gain insight into CD4-CD8 lineage commitment, we used pseudotime to characterize expression changes in TCR signaling targets, key transcription factors and coreceptors involved in this process. As expected, we observed that RNA expression often preceded the corresponding change in protein expression over pseudotime, as seen for Cd69/CD69 (Extended Data Fig. 3a), likely due to the time needed for protein translation, transport and degradation. As expected, the TCR response (exemplified by expression of TCR targets Cd69 and Egr1) became significantly higher in CD4-fated cells early in pseudotime (by bin 4–5, within DP2) (Fig. 3a,b). This pattern reversed at later pseudotimes, with higher TCR responses in CD8-fated cells at pseudotime bin 9–10 (within DP3) (Fig. 3a,b). This suggested two distinct waves of TCR signaling during positive selection: a broader initial wave in CD4-fated cells and a later wave, specific for CD8-fated cells (Fig. 3a and Extended Data Fig. 3b).

Fig. 3: Paired measurements of RNA and protein reveal the timing of major events in CD4-CD8 lineage commitment.
figure 3

a, Expression over pseudotime of TCR signaling response molecules. Features are totalVI denoised expression values scaled per feature and smoothed by loess curves. Schematic timeline below the plot aligns pseudotime with gated populations (see Fig. 2j). b, Differential expression over pseudotime between CD4-fated and CD8-fated thymocytes for RNA of TCR signaling response molecules. Non-significant (NS) differences are gray, significant differences are filled circles. Size of the circle indicates log(Bayes factor). Data are median log fold change ± totalVI-computed s. d. c, Expression over pseudotime as in a for RNA of key transcription factors. d, Differential expression over pseudotime as in b for RNA of key transcription factors. e, In silico flow cytometry plots of log(totalVI denoised expression) of Runx3 and Zbtb7b from positively selected thymocytes separated by lineage and colored by pseudotime. f, Expression over pseudotime as in a for coreceptor RNA and protein. g, Differential expression over pseudotime as in b for RNA of coreceptors. Significant RNA results are filled circles and significant protein results are open circles. b,d,g, n = 9,545 CD4-fated cells (MHCI−/−, OT-II and AND mice) and 9,126 CD8-fated cells (MHCII−/−, OT-I and F5 mice).

We next used pseudotime to identify the divergence of transcription factors between the lineages and relate these expression patterns to the timing of TCR signaling. We focused on the lineage-defining transcription factors Runx3 (CD8+ T cell lineage), Zbtb7b (CD4+ T cell lineage) and Gata3 (upstream activator of Zbtb7b that is more highly expressed in the CD4+ T cell lineage)10,27. The differential upregulation of Gata3 in CD4-fated cells coincided with differential expression of the first TCR response wave in pseudotime bin 4–5 (within DP2), followed by differential upregulation of Zbtb7b in bin 7–8 (within CD4+CD8lo) (Fig. 3c,d). Differential upregulation of Runx3 in CD8-fated cells occurred between pseudotimes 8 and 10, overlapping with DP3 and the second rise in TCR signaling (Fig. 3d). Intracellular flow cytometry in wild-type thymocytes supported the observed timing in differential expression of these transcription factors (Extended Data Fig. 3c). IL-7 and other STAT5 activating cytokines were reported to promote Runx3 upregulation and the CD8+ T cell fate3, but we did not observe a lineage-specific increase in STAT5 target gene expression correlating with Runx3 upregulation (Extended Data Fig. 3d).

We observed that Gata3 induction was followed by a rise in Zbtb7b in both lineages, although their expression was lower and more transient in CD8-fated cells compared to CD4-fated cells (Fig. 3c,d and Extended Data Fig. 3b). This suggested that both lineages initially ‘audition’ for the CD4+ T cell fate, although MHCI-specific cells do so unsuccessfully. A large rise in Runx3 expression, which occurred only in CD8-fated cells, overlapped with the decrease in Zbtb7b at pseudotime 7–11 (CD4+CD8lo and DP3 stages) (Fig. 3c,d), implying that both transcription factors may be transiently co-expressed in CD8-fated cells, in spite of their ability to repress each other’s expression and their reported mutually exclusive expression at later stages10,28. In silico flow cytometry of Zbtb7b and Runx3 showed that both CD4- and CD8-fated thymocytes initially upregulated Zbtb7b, whereas CD8-fated thymocytes subsequently downregulated Zbtb7b, simultaneous with Runx3 upregulation (Fig. 3e and Extended Data Fig. 4a). To test the co-expression of THPOK and RUNX3 in CD8-fated cells, we performed intracellular flow cytometry in OT-I mice, which have a prominent CD4+CD8lo population23. Wild-type and OT-II mice were included for comparison. We observed a small population of cells co-expressing THPOK and RUNX3 in the positively selecting OT-I thymocytes and wild-type thymocytes, but not in the OT-II thymocytes (Extended Data Fig. 4b). The expression of RUNX3 in THPOK+ OT-I thymocytes was substantially lower than in mature CD8+ thymocytes, but was significantly above background, as determined by fluorescence minus one controls and staining of THPOK+ OT-II thymocytes (Extended Data Fig. 4c–e). These data suggest that OT-I thymocytes contain a population that recently failed the CD4 audition and were transitioning towards CD8+ T cell lineage specification.

The pattern of coreceptor expression and its impact on TCR signaling is a key factor in CD4-CD8 lineage commitment1,2,3,4. CD4 and CD8 both exhibited an initial dip in expression (the ‘double dull’ stage24), followed by a rise in CD4 and a continued decrease in CD8 expression (Fig. 3f). In CD8-fated cells, CD8 expression recovered as CD4 expression decreased, resulting in the DP3 stage as the cells progressed towards CD8 SP (Fig. 3f). Cd8a became significantly differentially expressed between lineages at pseudotime 6, which corresponded to the rise in Zbtb7b, and Cd4 became differentially expressed at pseudotime 9, which corresponded to the preferential expression of Runx3 in the CD8+ T cell lineage (Fig. 3g). In CD4-fated cells, CD4 expression remained relatively high and was not correlated with expression of Cd69 and Egr1 (Fig. 3a,f). The gradual decline in TCR signal after pseudotime 3 was likely due to negative feedback, including induction of the ERK signaling inhibitors Dusp2/5 (Extended Data Fig. 3b)29,30. In contrast, in CD8-fated cells, the faster decline in TCR signaling during the first wave coincided with declining CD8 expression (Fig. 3a,f), as predicted by the kinetic signaling model3. Moreover, the second rise in TCR signaling (pseudotimes 8–10) correlated with the rise in CD8 expression (Fig. 3a,f). Thus, the role of CD8 in facilitating MHCI recognition, together with other factors that increase thymocyte sensitivity to TCR signals at later developmental stages7,9,31 could explain the second TCR signaling wave. Together, these analyses indicated the existence of an initial CD4+ T cell lineage auditioning phase for both MHCI- and MHCII-specific thymocytes and were consistent with a role for TCR signaling in late CD8+ T cell lineage specification5,6,7,8 (Extended Data Fig. 5).

Differential expression implicates lineage drivers

To systematically investigate CD4-CD8 lineage divergence, we performed totalVI differential expression tests between lineage-restricted thymocytes within equivalent units of pseudotime. We found no substantial differences in RNA or protein expression between the lineages at the early DP stages, but differential expression accumulated throughout maturation (Fig. 4a, Supplementary Data 7 and Supplementary Information). This analysis resulted in a set of 302 genes with significantly higher expression in CD4-fated thymocytes (hereafter CD4-DE), 397 genes with significantly higher expression in CD8-fated thymocytes (CD8-DE) and 92 genes with higher expression in each lineage in at least one pseudotime unit that were included in both sets (Extended Data Fig. 6a,b). The genes in each set were clustered by their expression in cells of the corresponding lineage (Fig. 4b,c and Supplementary Data 8 and 9). Inspection of mean gene expression of each cluster over pseudotime reflected temporal variations in expression (Fig. 4b–d). For example, CD4-DE cluster 5 and CD8-DE cluster 1 showed a late divergence in expression of transcription factors and genes related to effector functions in their respective lineages (Zbtb7b and Cd40lg in CD4-fated cells, and Runx3 and Nkg7 in CD8-fated cells). Three clusters (CD4-DE clusters 4 and 7 and CD8-DE cluster 3) were significantly enriched for TCR target genes (hypergeometric test, Benjamini–Hochberg -adjusted P < 0.05). CD4-DE cluster 7 and CD8-DE cluster 3 contained overlapping TCR target genes, including Cd69 and Egr1, and showed an early expression peak that was more sustained in CD4-fated relative to CD8-fated cells, and a second peak, specifically in CD8-fated cells (Fig. 4b–d). TCR target genes in CD4-DE cluster 4, including Cd5 and Gata3, displayed a similarly early, single peak, which was more sustained for CD4-fated cells (Fig. 4b,d). CD8-DE clusters 0 and 4 exhibited increased expression in CD8-fated cells just before the second rise in TCR signaling and contained genes implicated in modulating TCR sensitivity (Fig. 4c). These included Cd8a, which is required for MHCI recognition and Themis, which modulates TCR signal strength during positive selection32 in CD8-DE cluster 0, and the ion channel component genes Kcna2 and Tmie, which contribute to enhancing TCR sensitivity in thymocytes with low self-reactivity31, in CD8-DE cluster 4.

Fig. 4: Gene expression differences between CD4-fated and CD8-fated cells implicate putative drivers of lineage commitment.
figure 4

a, Number of differentially expressed features between CD4-fated (MHCI−/−, OT-II and AND) and CD8-fated (MHCII−/−, OT-I and F5) cells across pseudotime. b, Genes (RNA) upregulated in CD4-fated versus CD8-fated cells scaled per gene and clustered by the Leiden algorithm according to expression in CD4-fated cells. Expression over pseudotime per cluster is displayed as the mean of scaled totalVI denoised expression per gene for genes in a cluster, smoothed by loess curves. c, As in b, but for genes upregulated in CD8-fated cells, clustered according to expression in CD8-fated cells. d, Expression over pseudotime of selected TCR target genes that are differentially expressed between the two lineages and putative targets of Nfatc2, according to ChEA3. Asterisk indicates genes differentially expressed during the time windows used for ranking in e. totalVI denoised expression values are scaled per gene and smoothed by loess curves. e, Transcription factor enrichment analysis by ChEA3 for CD4-lineage-specific differentially expressed genes (left). Transcription factors are ranked by mean enrichment in the three pseudotime bins before Zbtb7b differential expression (pseudotime 4–7; pseudotime 7–8 is for visualization only). Gray indicates genes detected in less than 5% of cells in the relevant population. ‘Differentially expressed’ indicates significant upregulation in at least one of the relevant time bins. ‘Targets master regulator’ indicates a transcription factor that targets either Gata3, Runx3, or Zbtb7b in ChEA3 databases. ‘TCR pathway’ indicates membership in NetPath34 TCR Signaling Pathway, genes transcriptionally upregulated by TCR signaling, or genes with literature support for TCR pathway membership. Right column is the same as the left but for CD8 lineage-specific differentially expressed genes, with ranking by mean enrichment in the three pseudotime bins before Runx3 differential expression (pseudotime 5–8; pseudotime 8–9 is for visualization only). f, Transcription factor enrichment analysis for TCR target-enriched gene clusters. The top 15 transcription factors enriched in the gene sets are shown. Colors represent transcription factors activated by the respective branch of TCR signaling. Gray indicates additional transcription factors associated with TCR signaling based on Netpath34. In e and f, lower ranks and lower scores are better (meaning more enrichment).

To identify which transcription factors may influence lineage commitment, we focused on pseudotimes 4–7, just after differential gene expression was first detected and before Zbtb7b induction, for the CD4+ T cell lineage, and pseudotimes 5–8, just before Runx3 induction, for the CD8+ T cell lineage. We performed transcription factor enrichment analysis with ChEA3, which identifies the transcription factors most likely to explain the expression of a set of target genes33. We used differentially expressed genes between lineages in each unit of pseudotime as the target gene sets (Fig. 4e and Supplementary Data 10 and 11). For each transcription factor, we also considered known associations with TCR signaling34, evidence that it regulates Gata3, Zbtb7b or Runx33, and whether the transcription factor itself was differentially expressed at the relevant pseudotime. In CD4-lineage cells, several top-ranked transcription factors by ChEA3 were associated with TCR signaling pathways (Egr2, Nfatc2, Egr1, Nfatc1 and Rel) (Fig. 4e). The top two in CD4-fated cells were Egr2 and Nfatc2, which lie downstream of the extracellular signal-regulated kinase branch (hereafter MEK-ERK) and calcineurin-NFAT branch, respectively35,36,37,38, two of the three main branches of the TCR signal transduction pathway.

To explore how TCR signaling associated with divergent transcriptional regulation between the two lineages, we examined genes in CD4-DE clusters 4 and 7 and CD8-DE cluster 3, which all showed an early peak that corresponded to the CD4 audition phase in pseudotime. CD4-DE cluster 4 contained Gata3, a target of the TCR-associated transcription factor NFAT34,39,40, exhibited more transient expression and lacked a prominent second peak in CD8-fated cells (Fig. 4b), implying these genes were regulated by a branch of the TCR signaling pathway that was selectively active early during the CD4 audition. ChEA3 analysis of the genes in CD4-DE cluster 4 showed enrichment for NFAT family member Nfatc2, with Gata3, Cd5, Id3, Cd28 and Lef1 contributing to the enrichment score (Fig. 4f and Supplementary Data 12). By contrast, CD4-DE cluster 7 and CD8-DE cluster 3 showed enrichment for the AP-1 transcription factors Fosb and Junb, NF-κB family members Rel and Nfkb1/2, and MEK-ERK target Egr1 (Fig. 4f, Extended Data Fig. 6c and Supplementary Data 13). This suggested that all three branches of the TCR signaling pathway participated during the CD4 audition, whereas only MEK-ERK and PKC-NF-κB were active in the later CD8 specification. Thus, ChEA3 analyses implicated NFAT in driving early TCR-induced transcriptional differences between lineages (Fig. 4e).

Calcineurin-NFAT promotes CD4+ T cell lineage via GATA3

Genetic disruption of the calcineurin B1 regulatory subunit in thymocytes, or 10-day in vivo treatment with calcineurin inhibitors leads to a developmental defect in DP thymocytes that obscures a possible role of calcineurin downstream of TCR signals during positive selection41. Because mature SPs first appear shortly after birth in mice, we used ex vivo culture of thymic tissue slices from postnatal day 1 mice to inhibit TCR signaling during a new wave of CD4 and CD8 SP development. Thymic slices cultured for 0 or 24 h contained mostly DP thymocytes, whereas frequencies of CD4 SP, CD8 SP, CD4+CD8lo and CD4 SM increased between 48 and 96 h (Extended Data Fig. 7a,b). As expected, CD8+ T cell development was slightly delayed compared to that of CD4+ T cells21,42 (Extended Data Fig. 7b). Treatment of wild-type cultures with 200 ng ml−1 or less of the calcineurin inhibitor cyclosporin A (CsA) for 96 h did not impact the relative size of most thymocyte populations while leading to a selective and dose-dependent reduction in CD4+CD8lo and CD4 SM thymocytes (Fig. 5a). A similar reduction in CD4+CD8lo thymocytes was observed in CsA-treated MHCI−/− cultures, whereas neonatal slice cultures from MHCII−/− mice had a reduced CD4+CD8lo population that was not impacted by CsA (Extended Data Fig. 7c). Time course analyses in wild-type cultures showed that the reduction in CD4+CD8lo and CD4 SM cells became significant after 72 and 48 h of culture, respectively (Fig. 5b).

Fig. 5: Calcineurin blockade impairs new CD4 SP development and GATA3 induction.
figure 5

a, Frequency (% of live cells) of CD4+CD8lo, CD4 SM, CD4 SP or CD8 SP in thymic slices from postnatal (day 1) wild-type mice cultured with in 50, 100 or 200 ng ml−1 cyclosporin A (CsA) for 96 h. Data were compiled from three independent experiments and analyzed using an ordinary one-way ANOVA. No CsA, n = 10; 50 ng ml−1, 100 ng ml−1 and 200 ng ml−1, n = 7 each. Gating strategy, Extended Data Fig. 7a. b, Frequency (% of live cells) of CD4+CD8lo and CD4 SM cells after 0, 24, 48, 72 and 96 h culture in medium alone or with 200 ng ml−1 CsA as in a. Data were compiled from nine independent experiments. No CsA: 0 h, n = 6; 24 h, n = 9; 48 h, n = 10; 72 h, n = 22; 96 h, n = 10. 200 ng ml−1 CsA: 24 h, n = 6; 48 h, n = 7; 72 h, n = 14; 96 h, n = 7. Data were analyzed using an ordinary two-way ANOVA with multiple comparisons. c, Frequency (% of live cells) of EdU+ DP, EdU+ CD4+CD8lo, and EdU+ CD4 SP thymocyte populations from AND mice (i.p.) injected with EdU, rested for 16 h, and then treated with 1 (24 h) or 2 (48 h) 5 μg doses of FK506 (i.p.) every 24 h. Each dot represents one mouse. Data were pooled from three independent experiments. FK506 mice, n = 5; control, n = 4. d,e, Geometric mean fluorescent intensity (gMFI) of GATA3 detected by intracellular flow cytometry staining in CD69+ DP thymocytes from wild-type neonatal thymic slice cultures after 48-h treatment with 200 ng ml−1 CsA (d) or 72-h treatment with 6.3 ng ml−1 FK506 (e). Data were compiled from three (d) or two (e) independent experiments. For d, no CsA, n =  10 ; 200 ng ml−1CsA, n = 10. Fore, no FK506, n = 6; 6.3 ng ml−1 FK506, n = 6. Each symbol represents a thymic slice. Data were analyzed using an unpaired, two-sided t-test. norm., normalized; NS, not significant. Error bars indicate mean ± s.e.m.

To investigate why CsA reduced CD4+CD8lo and CD4 SM, without impacting the overall number of CD4 SP, we used EdU to label a cohort of proliferating DP thymocytes that just completed TCRβ selection, and we followed those cells for 2 days in the presence or absence of calcineurin blockade (Extended Data Fig. 7d,e). We used the calcineurin inhibitor FK506, which blocks positive selection without the loss of DP thymocytes observed with CsA43. Adult AND mice were injected i.p. with 1 dose of EdU followed by i.p. injection with FK506 every 24 h starting 16 h post-EdU administration. In control mice that were injected with EdU but not FK506, the percentage of EdU+ CD4 SP increased from ~2% to 4% between 24 and 48 h (Fig. 5c), reflecting the conversion of labeled thymocytes from DP to CD4 SP during the time course. FK506 treatment had no significant impact on the overall percentage of DP or CD4 SP thymocytes (Extended Data Fig. 7f), and a 24-h treatment had no significant impact on the number of EdU+ DP or CD4 SP compared to samples without FK506 (Fig. 5c). However, a 48-h treatment with FK506 led to a significant reduction in the EdU+ CD4 SP, CD4+CD8lo and DP thymocytes (Fig. 5c), suggesting a reduction of newly developed CD4 SP. To confirm that the reduction in CD4 SP development was not an indirect consequence of impaired ERK activation41, we stimulated thymocytes from FK506-treated mice by TCR crosslinking for 2 min, followed by flow cytometry to detect phosphorylated ERK (p-ERK). Strong p-ERK induction in DP thymocytes was detected in 24- and 48-h FK506-treated mice, similar to untreated controls (Extended Data Fig. 7g). Together these data indicated that blockade of calcineurin activation downstream of TCR during positive selection prevented new CD4 SP development but did not interfere with already selected CD4 SP thymocytes.

To test whether CsA treatment prevented CD4 development by interfering with GATA3 induction, we quantified GATA3 expression in neonatal slice cultures treated with CsA for 48 h. We observed a significant reduction in GATA3 expression in CD69+ DP from CsA-treated cultures compared to untreated cultures (Fig. 5d). In addition, neonatal slice cultures treated with moderate concentrations (0.4–6 ng ml−1) of FK506 exhibited a loss of CD4+CD8lo and CD4 SM thymocytes without a significant loss of mature CD4 or CD8 SP (Extended Data Fig. 7h), similar to CsA. FK506 treatment also led to a significant reduction in GATA3 protein expression in CD69+ DP (Fig. 5e). Together, these data implicated the calcineurin-NFAT-GATA3 axis as a link between TCR signals downstream of MHCII recognition and commitment to the CD4+ T cell lineage.

Calcineurin blockade selectively impacts the CD4 audition

To investigate whether the calcineurin-NFAT branch of the TCR signaling pathway had a selective role in the CD4 audition, we compared the impact of calcineurin versus MEK inhibition in neonatal thymic slice cultures. We used relatively low CsA (200 ng ml−1) and U0126 (2 or 10 μg ml−1) concentrations based on titration experiments (Extended Data Fig. 8a) to avoid off-target effects. We combined flow cytometric analyses of seven cell surface proteins (CD4, CD8α, CD5, TCRβ, CD69, CD24 and CD127) and three lineage-defining transcription factors (GATA3, THPOK and RUNX3) with a computational multidimensional gating approach by unsupervised clustering to define continuous developmental intermediates in an unbiased manner. These analyses identified populations that largely overlapped with the CD4 SP, CD8 SP and DP populations defined by manual gating, as well as a transitional population that largely overlapped with the CD4 SM population and also included some CD4+CD8lo cells (Fig. 6a and Extended Data Figs. 8b,c and 9a,b). Smaller populations of αβTCR and mature, unconventional T cells were also detected (Extended Data Fig. 9a). Wild-type cultures treated with CsA for 48 or 72 h had a significant reduction in the CD4 transitional population, with little impact on other populations, compared to untreated controls (Fig. 6b,c and Extended Data Fig. 9c,d). Cultures treated with U0126 at 10 μg ml−1 had a loss of CD4 SP and CD8 SP as well as the CD4 transitional population, whereas those treated with 2 μg ml−1 U0126 had normal numbers of transitional CD4 cells and slightly reduced CD4 SP and CD8 SP (Fig. 6b,c and Extended Data Fig. 9c,d). Similar results were obtained with manual gating (Fig. 6d,e) and suggested that calcineurin inhibition impacted a relatively restricted temporal window during positive selection that corresponded to the CD4 audition, whereas MEK inhibition impacted all stages of positive selection, including CD8 specification.

Fig. 6: Calcineurin inhibition selectively impacts the CD4 audition.
figure 6

a, UMAP plots of individual thymocytes from thymic slices of postnatal (day 1) wild-type mice cultured with no drug or CsA (200 ng ml−1) or U0126 (2 μg ml−1 or 10 μg ml−1) for 48 or 72 h. DP and SP populations are highlighted. b, UMAP plots for the 48-h time point separated into no drug, 200 ng ml−1 CsA, 2 μg ml−1 or 10 μg ml−1 U0126 conditions as in a. c, Scatter plots showing the log fold change in cell-type proportions relative to no-drug control for DP1, DP2a, DP2b, DP2c, CD4 transitional, CD4 SP and CD8 SP cell clusters for no drug, 200 ng ml−1 CsA, 2 μg ml−1 or 10 μg ml−1 U0126 conditions. Each dot is a separate thymic slice, and data are normalized to the mean of the no-drug condition. Error bars indicate mean ± s.e.m. d,e, Frequency (% of live cells) in CD4 SM, CD4 SP and CD8 SP populations normalized to untreated control at 48 h (d) or 72 h (e) of culture as in a, assessed by manual gating (Extended Data Fig. 7a). Error bars indicate mean ± standard error of the mean. f,g, Representative flow cytometry plots (f) and compiled data (g) showing the expression of CD5 and GATA3 in gated DP thymocytes from thymic slices cultured for 48 h with no drug, 200 ng ml−1 CsA, 2 μg ml−1 or 10 μg ml−1 U0126 as in a. ac, Data are from one representative experiment out of two. d,e,g, Each dot represents a thymic slice; data are compiled from three independent experiments for CsA (200 ng ml−1), n = 10; one experiment for U0126 low (2 μg ml−1), n = 3; two independent experiments for U0126 high (10 μg ml−1), n = 6; and four independent experiments for no drug, n = 13.

Calcineurin inhibition by CsA also impacted two computationally defined DP clusters (DP2b and DP2c) (Fig. 6c), that differed in their relative expression of CD5 (Extended Data Fig. 8c). Because CD5, along with GATA3, was one of the putative NFAT targets based on transcription factor enrichment analyses (Fig. 4e and Supplementary Data 12), we used manual gating in DP thymocytes to compare the expression of CD5 and GATA3 in neonatal slice cultures. We observed a small population of DP thymocytes with high expression of CD5 and GATA3 in wild-type cultures, which was proportionally decreased by treatment with CsA for 48 h, and increased by treatment with 2 μg ml−1 U0126 (Fig. 6f,g). Together, these data indicated that calcineurin-NFAT promoted strong induction of GATA3 and CD5 during the CD4 audition, thus promoting CD4+ T cell fate commitment, whereas MEK-ERK signaling provided more general differentiation and survival signals throughout positive selection.


Here, we applied single-cell multiomic analysis to generate a high-resolution timeline of RNA and surface protein expression throughout T cell maturation in the thymus. We identified an initial CD4 auditioning phase in which both CD4-fated and CD8-fated cells undergo a parallel induction of a CD4+ T cell differentiation program during a first TCR signaling wave, followed by a second TCR signaling wave that is specific for CD8-fated cells and overlaps with induction of a CD8+ T cell differentiation program. Our data confirmed and extended earlier analyses based on more limited sets of markers15,21,28,42,44, providing a comprehensive picture of events during CD4+ and CD8+ T cell development that can serve as a resource for future studies.

We used a high-resolution timeline to dissect the activity of TCR-regulated transcription factors during lineage commitment. Previous studies showed that the MEK-ERK branch of the TCR signaling pathway has a crucial role during positive selection9,45,46,47, the NF-κB branch is not required for positive selection48 and the role of the calcineurin-NFAT branch is unclear41. Our analyses showed that NFAT activity is likely to account for some of the earliest lineage-specific RNA differences and was most prominent during the CD4 audition, whereas the inferred activity of NF-κB and the MEK-ERK regulated factors AP-1 and EGR1/2 occurred throughout positive selection. Although previous work showed that long-term loss of calcineurin-NFAT activity impairs the ability of DP thymocytes to activate ERK upon TCR triggering41, we found that short-term, low-dose exposure to calcineurin inhibitors, conditions that did not impair ERK activation, prevented new development of mature CD4+ T cells and decreased the expression of GATA3 in DP thymocytes. These data are consistent with earlier studies implicating NFAT as a positive regulator of GATA339,40, and GATA3 as a major driver of the CD4 fate49, which points to a TCR–calcium–calcineurin–NFAT–GATA3 axis in driving CD4+ T cell lineage commitment.

The timeline of RNA and protein expression changes presented here provided a useful framework for understanding how TCR specificity for MHCI versus MHCII directs T cell fate. In particular, the notion of successive windows of opportunity for CD4+, followed by CD8+ T cell fate determination, corresponding with distinct waves of TCR signaling is consistent with a ‘sequential selection’ model for lineage determination50. In this model, the CD4 audition serves as an initial selection step to ensure a match between CD4 coreceptor expression and an MHCII-specific TCR, whereas the second TCR signaling wave provides an additional selection step to ensure a match between CD8 expression and an MHCI-specific TCR. During the CD4 audition, thymocytes bearing MHCII-specific TCRs experience a relatively sustained first wave of TCR signaling, allowing them to lock in the CD4 fate. On the other hand, thymocytes bearing MHCI-specific TCRs experience a more transient first signaling wave, due in part to the drop in CD8 coreceptor expression (in line with kinetic signaling3), resulting in a failed CD4 audition. After the failed audition, MHCI-specific thymocytes experience a second wave of TCR signaling driving CD8 fate specification. The notion of a second TCR-driven selection stage for CD8-fated cells fits with prior evidence for a prolonged requirement for TCR signaling for CD8+ T cell development5,6,7,9 but is at odds with the kinetic signaling model, which invokes a complete loss of TCR signals and an exclusive role for cytokine signals during CD8+ T cell lineage specification3.

Although the current study focused on T cell lineage commitment, the approach used here has broader utility for studying developmental systems. The simultaneous measurement of RNA and protein not only allowed us to track the differences in relative timing of RNA and protein expression events but also can inform the design of high-dimensional flow cytometry studies for further analyses of developmental intermediates. Future work that integrates spatial measurements with our transcriptomic and proteomic profiles would provide valuable information about how signals from the tissue environments impact cell fate decisions.



All animal care and procedures were carried out in accordance with guidelines approved by the Institutional Animal Care and Use Committees at the University of California, Berkeley and at BioLegend, Inc. Wild-type (B6) (C57BL/6, 000664), B2m–/– (B6.129P2-B2mtm1Unc/DcrJ, 002087; referred to as MHCI−/−), OT-I (C57BL/6-Tg(TcraTcrb)1100Mjb/J, 003831), and OT-II (B6.Cg-Tg(TcraTcrb)425Cbn/J, 004194) were obtained from The Jackson Laboratory. MHCII−/− (H2-Ab1−/−, also known as I-Aβ−/−) mice have been previously described52. AND TCRtg Rag1−/− mice and F5 TCRtg Rag1−/− mice were generated by crossing AND TCRtg (B10.Cg-Tg(TcrAND)53Hed/J, 002761)53 and F5 TCRtg (C57BL/6-Tg(CD2-TcraF5,CD2-TcrbF5)1Kio)54 mice with Rag1−/− mice (Rag1−/−B6.129S7-Rag1tm1Mom) as previously described7).

CITE-seq experiment

All mice used in CITE-seq experiments were females between 4 and 8 weeks of age. Samples are further described in Supplementary Data 2. Mice were group housed with enrichment and segregated by sex in standard cages on ventilated racks at an ambient temperature of 26 °C and 40% humidity. Mice were kept in a dark/light cycle of 12 h on and 12 h off and given access to food and water ad libitum. For cell preparation, mice were euthanized and thymi were harvested, placed in RPMI with 10% FBS medium on ice, mechanically dissociated with a syringe plunger and passed through a 70 μm strainer to generate a single-cell suspension.

For antibody panel preparation, we prepared a panel containing 111 antibodies (TotalSeq-A mouse antibody panel 1, BioLegend, 900003217), which are enumerated in Supplementary Data 1. Immediately before cell staining, we centrifuged the antibody panel for 10 min at 14,000g to remove antibody aggregates. We then performed a buffer exchange on the supernatant using a 50 kDa Amicon spin column (Millipore, UFC505096) following the manufacturer’s protocol to transfer antibodies into RPMI with 10% FBS.

To enrich for positively selecting thymocytes in MHC-deficient and some wild-type samples (Supplementary Data 2), live, single TCRβ+CD5+ thymocytes were sorted by FACS. We took advantage of the fact that cells were already stained with TotalSeq (oligonucleotide-conjugated) antibodies and therefore designed oligonucleotide-fluorophore conjugates complementary to the TotalSeq barcodes (5′-CACTGAGCTGTGGAA-AlexaFluor488-3′ for CD5; 5′-TCCCATAGGATGGAA-AlexaFluor647-3′ for TCRβ). Before cell staining, the TotalSeq antibody panel was mixed with oligonucleotide-fluorophore conjugates in a 1:1.5 molar ratio. This mixture was incubated for 15 min at room temperature to allow for oligonucleotide hybridization and then transferred to ice. Cells were then stained with the antibody/oligonucleotide-fluorophore mixture according to the TotalSeq protocol. Cells were stained, washed and resuspended in RPMI with 10% FBS to maintain viability. Cells were sorted using a BD FACSAria Fusion (BD Biosciences).

The CITE-seq experiment was performed following the TotalSeq protocol. Cells were stained, washed and resuspended in RPMI with 10% FBS to maintain viability. We followed the 10x Genomics Chromium Single Cell 3′ v3 protocol to prepare RNA and antibody-derived-tag (ADT) libraries55.

RNA and ADT libraries were sequenced with either an Illumina NovaSeq S1 or an Illumina NovaSeq S4. Reads were processed with Cell Ranger v.3.1.0 with feature barcoding, where RNA reads were mapped to the mouse mm10–2.1.0 reference (10x Genomics, STAR aligner56) and antibody reads were mapped to known barcodes (Supplementary Data 1). No read depth normalization was applied when aggregating samples.

CITE-seq data preprocessing

Before analysis with totalVI, we performed preliminary quality control and feature selection on the CITE-seq data. Cells with a high percentage of UMIs from mitochondrial genes (>15% of a cell’s total UMI count) were removed. We also removed cells expressing <200 genes, and retained cells with protein library size between 1,000 and 10,000 UMI counts. We removed cells in which fewer than 70 proteins were detected of the 111 measured in the panel. An initial gene filter removed genes expressed in fewer than four cells. The top 5,000 highly variable genes (HVGs) were selected by the Seurat v3 method57 as implemented by scVI58. In addition to HVGs, we also selected genes encoding proteins in the measured antibody panel and a manually selected set of genes of interest. After all filtering, the CITE-seq dataset contained a total of 72,042 cells, 5,125 genes and 111 proteins.

totalVI analysis of all CITE-seq data

We ran totalVI on CITE-seq data after filtering (described above), using a 20-dimensional latent space, a learning rate of 0.004, and early stopping with default parameters. Each 10x lane was treated as a batch. When generating denoised gene and protein values, we applied the transform_batch parameter17 to view all denoised values in the context of wild-type samples.

To conduct cell annotation, we stratified cells of the thymus into cell types and states based on the totalVI latent space, taking advantage of both RNA and protein information. We first clustered cells in the totalVI latent space with the Scanpy59 implementation of the Leiden algorithm60 at resolution 0.6, resulting in 18 clusters. We repeated this approach to subcluster cells. We used Vision61 with default parameters for data exploration. Subclusters were manually annotated based on curated lists of cell-type markers17,18, resulting in 20 annotated clusters (excluding one cluster annotated as doublets). We visualized the totalVI latent space in two dimensions using the Scanpy59 implementation of the UMAP algorithm51.

In addition to thymocyte populations previously described, we identified two distinct thymocyte clusters undergoing negative selection, based on expression of Bcl2l11 (BIM), Nr4a1 (NUR77) and Ik2f2 (HELIOS)62 (Extended Data Fig. 1b). The first cluster (Neg. Sel. (1)) emerged from DP (Sig.) adjacent to a cluster of dying cells, and possessed markers of an early wave of negative selection62 like upregulated Pdcd1 (PD-1) and downregulated Cd4/CD4 and Cd8/CD8 (Fig. 1d and Extended Data Fig. 1b,c). The second (Neg. Sel. (2)) emerged from immature CD4+ T cells and possessed markers of a late wave of negative selection62 like upregulated Tnfrsf18 (CD357/GITR) and Tnfrsf4 (CD134/OX40) (Extended Data Fig. 1b,c). Foxp3+ regulatory T cells clustered near mature conventional CD4+ T cells and Neg. Sel. (2) (Fig. 1a). We also detected γδ T cells, NKT cells, B cells, myeloid cells, erythrocytes, a thymocyte population with high expression of interferon response genes63 and a population of mature T cells that had returned to cycling following the cell cycle pause during thymocyte development (Fig. 1a).

We conducted a one-vs-all differential expression test between all annotated cell types, excluding clusters annotated as doublets or dying cells. We identified cell-type markers by filtering for significance (log(Bayes factor) > 2.0 for genes, log(Bayes factor) >1.0 for proteins), effect size (median log fold change (LFC) > 0.2 for both genes and proteins), and the proportion of expressing cells (detected expression in >10% of the relevant population for genes), and sorting by the median LFC. For marker visualization, we selected the top four (if existing) differentially expressed genes and proteins per cell type, arranged by the cell type in which the LFC was highest.

totalVI analysis of positive selection subset of CITE-seq data

To further analyze thymocyte populations with a focus on positively selected cells, we selected the following annotated clusters: Signaled DP, Immature CD4, Immature CD8, Mature CD4, Mature CD8, Interferon signature cells63, Negative selection (wave 2), and Treg cells. With an interest in the variation within thymocyte populations (rather than all cells in the thymus), we selected the top 5,000 HVGs in this subset, as well as genes encoding proteins in the measured antibody panel and a manually selected set of genes of interest. This resulted in a CITE-seq dataset containing 35,943 cells, 5,108 genes and 111 proteins. We ran totalVI on this subset dataset and generated denoised values as described above. We performed Leiden clustering and visualized the totalVI latent space in two dimensions using UMAP as described above.

After visualizing the totalVI latent space of the thymocyte subset, we applied additional filters to restrict to cells on the CD4-CD8 developmental trajectory. We used two resolutions of Leiden clustering (0.6 and 1.4) and sub-clustering as described above to identify and remove clusters of negatively selected cells, Treg cells, gamma-delta-like cells, mature cycling cells, and outlier clusters of doublets, interferon signature cells, and CD8-transgenic-specific outlier cells. After filtering, this dataset contained 29,408 cells that were used for downstream analysis. Differential expression testing of positively selecting thymocytes using pseudotime information is described below.

Pseudotime inference

Slingshot19 was selected for pseudotime inference based on its superior performance in a comprehensive benchmarking study20. Slingshot pseudotime was derived from the UMAP projection of the totalVI latent space. The starting point was assigned to DP cells, and two endpoints were assigned to mature CD4+ and CD8+ T cells. Slingshot pseudotime derived from the full 20-dimensional totalVI latent space was highly correlated with that from the 2-dimensional space (Extended Data Fig. 2a), supporting our use of the 2D-derived pseudotime values for ease of visualization and analysis.

Initial lineage assignment of cells was made on the basis of their genotype (CD4+ T cell lineage for MHCI−/−, AND, and OT-II mice, CD8+ T cell lineage for MHCII−/−, F5, and OT-I mice, unassigned for wild-type mice). However, small numbers of cells in MHC-deficient and TCRtg mice develop along the alternative lineage (particularly in TCRtgs that are Rag sufficient, which might express an endogenous TCR in addition to the transgenic TCR). We therefore added an additional filter of Slingshot lineage assignment weight > 0.5. Cells with a Slingshot lineage assignment weight of < 0.5 along the expected lineage based on genotype were excluded from the remaining pseudotime-based analysis.

In silico flow cytometry and gating

To perform in silico flow cytometry, totalVI denoised protein counts were log-transformed and visualized in biaxial-style scatter plots. Gates in biaxial plots were determined based on contours of cell density. An approximate alignment of gated populations to pseudotime was generated by identifying thresholds classifying adjacent populations in pseudotime by maximizing the Youden criteria.

Adult thymocyte population analysis with fluorescence-based flow cytometry

For thymocyte population analysis in adult mice, 6- to 8-week-old eight-week-old wild-type, MHCI−/− or MHCII−/− mice (described above) were used. Thymi were analyzed from eight mice per genotype (four male and four female). All antibodies are described in Supplementary Data 1.

Thymi were mechanically dissociated into a single-cell suspension, depleted of red blood cells using ACK Lysis Buffer (0.15 M NH4Cl, 1 mM KHC3 and 0.1 mM Na2EDTA). Cells were filtered, washed and counted before being stained with a live/dead stain; Zombie NIR Fixable Viability Kit (BioLegend). Samples were blocked with anti-CD16/32 (2.4G2) and stained with surface antibodies against CD4, CD8, TCRβ, CD5, CD69 and CD127 (IL-7Ra) in FACS buffer (1% BSA in PBS) containing Brilliant Stain Buffer Plus (BD Biosciences). Intracellular staining for GATA3, THPOK, and RUNX3 was performed using the eBioscience FOXP3/Transcription Factor Staining Buffer Set (Thermo Fisher Scientific). All antibodies were purchased from BD Biosciences, BioLegend or eBioscience. Single-stain samples and fluorescence minus one (FMO) controls were used to establish PMT voltages, gating and compensation parameters. Cells were processed using a BD LSRFortessa or BD LSRFortessa X20 flow cytometer and analyzed using FlowJo software (Tree Star). Gates defining all populations were based on in silico-derived gates for all described proteins with the exception of CD127 in the CD4 SM, CD8 SM, CD4 Mat and CD8 Mat populations. In these cases, the CD127 fluorescent antibody did not have comparable sensitivity to the CD127 CITE-seq measurement and was therefore excluded.

Differential expression analysis of positively selecting thymocytes with totalVI

Temporal features (that is, features that are differentially expressed over time) were determined by a totalVI one-vs-all DE test within each lineage between binned units of pseudotime. DE criteria (as above) included filters for significance (log(Bayes factor) >2.0 for genes, log(Bayes factor) > 1.0 for proteins), effect size (median log fold change >0.2 for both genes and proteins), and the proportion of expressing cells (detected expression in > 5% of the relevant population for genes). Top temporal genes were selected as the unique set among the top three differentially expressed genes per time that were differentially expressed in both lineages.

Differences between lineages were determined by a totalVI within-cluster DE test, where clusters were binned units in pseudotime and the condition was lineage assignment (that is, cells within a given unit of pseudotime were compared between lineages). Criteria for DE were the same as above.

To cluster differentially expressed genes into patterns, totalVI denoised gene expression values were standard scaled, reduced dimensions across cells using PCA, and clustered genes using the Leiden algorithm60 as implemented by Scanpy59. For features differentially expressed between lineages, the genes upregulated within a lineage were clustered according to expression within the lineage in which they were upregulated.

To test for enrichment of TCR signaling in differentially expressed gene clusters, we performed a hypergeometric test (phyper). TCR signaling genes were compiled from Netpath34 and a set of genes activated upon stimulation in DP thymocytes64. The background set included all genes considered in DE analysis. P values were adjusted by the Benjamini-Hochberg procedure.

Transcription factor enrichment analysis

To perform transcription factor enrichment analysis with ChEA333, we first selected target gene sets as genes differentially upregulated in one lineage relative to the other in each unit of pseudotime, filtered for significance (log(Bayes factor) >2.0), effect size (median log-transformed fold change >0.2), and detected expression in >5% of the population of interest. For each target gene set, transcription factors were scored for enrichment by the integrated mean ranking across all ChEA3 gene set libraries (MeanRank) based on the top performance of this ranking method33. ChEA3 analysis on gene clusters was performed as above, but using gene clusters as the target gene set.

To generate an overall ranking of transcription factors for their likely involvement in CD4-CD8 lineage commitment, we focused on enrichment in the three units of pseudotime before master regulator differential expression in each lineage (that is, in the CD4+ T cell lineage, the relevant pseudotime units are 4, 5 and 6, before the differential expression of Zbtb7b differential expression at pseudotime 7; in the CD8+ T cell lineage, the relevant pseudotime units are 5, 6 and 7, before the differential expression of Runx3 at pseudotime 8). We excluded the pseudotime unit containing master regulator differential expression from the ranking, as genes differentially expressed at this time could be the result of the master regulator itself enforcing lineage-specific changes rather than the factors driving initial commitment to a lineage. The pseudotime unit containing master regulator differential expression is included in Fig. 4e for visualization, but did not contribute to the ranked order of transcription factors. We also excluded earlier units of pseudotime since these times included very few ( < 15) significantly different genes between the lineages. Finally, pseudotime bins in which a transcription factor was not expressed in at least 5% of the population of interest, did not contribute towards that transcription factor’s ranking. The overall ranking of candidate driver transcription factors was then generated by taking the mean of ranks across the relevant pseudotime units. Note that lower rank and lower score are better (meaning more enrichment).

Transcription factors were annotated by whether they had a known association with TCR signaling. A list of molecules involved in TCR signaling were curated from the NetPath database of molecules involved in the TCR signaling pathway and the NetPath database of genes transcriptionally upregulated by the TCR signaling pathway34. Additional genes related to TCR signaling were curated from literature sources49,65,66,67,68. Transcription factors were also annotated by whether they were known to target either Gata3, Zbtb7b or Runx3 according to ChEA3 databases (that is, Gata3, Zbtb7b or Runx3 appeared in the overlapping gene list for the transcription factor of interest in any ChEA3 query).

Neonatal thymic slice experiments

For neonatal thymic slice experiments, postnatal day 1 (P1) wild-type, MHCI−/− or MHCII−/− male and female mice (described above) were used.

Thymic slices were prepared as previously described69,70, with minor modifications to adjust for the smaller size of neonatal thymi compared to those of adults. Thymic lobes were dissected, removed of connective tissue, embedded in 4% low melting point agarose (GTG-NuSieve Agarose, Lonza) and sectioned into 500 μM slices using a vibratome (VT1000S, Leica). Slices were placed onto 0.4 μM transwell inserts (Corning, 353090) and cultured in 6-well tissue culture plates containing 1 mL of complete RPMI medium (RPMI-1640 (Corning), 10% FBS (Thermo Fisher Scientific), 100 U ml−1 penicillin/streptomycin (Gibco), 1X L-glutamine (Gibco), 55 µM 2-mercaptoethanol (Gibco). Slices were cultured for indicated periods of time at 37 °C, 5% CO2, before being prepared and analyzed by flow cytometry. Due to the practical limitations of using a single pup as a biological replicate, a litter of pups were harvested for thymic slices (approximately six pups/litter and four slices/pup) and three or four thymic slices were randomly allocated to each condition. For neonatal slice data in Figs. 5 and 6, each dot represented a single thymic slice (n = a slice). Statistical analysis was conducted on slices pooled from independent experiments. For neonatal slice cultures containing Cyclosporin A (CsA; Millipore-Sigma, 239835), CsA was serially diluted to indicated concentrations (50–800 ng ml−1) and added directly to the culture medium. FK506 (Tacrolimus; InvivoGen, inh-fk5-5) and U0126 (InvivoGen, tlrl-u0126) were serially diluted in indicated concentrations (0.39-6.3 ng ml-1 and 0.63-10 µg ml−1, respectively) and added directly to culture medium.

Thymic slices were mechanically dissociated into a single-cell suspension, then filtered, washed and counted before being stained with a live dead/stain; Propidium Iodine (BioLegend), Ghost Violet 510 (Tonbo), Zombie NIR, or Zombie UV Fixable Viability Kit (BioLegend). Samples were blocked with anti-CD16/32 (2.4G2) and stained with surface antibodies against CD4, CD8, TCRβ, and CD69 in FACS buffer (1% BSA in PBS) containing Brilliant Stain Buffer Plus (BD Biosciences). Intracellular staining for GATA3, RUNX3, and THPOK was performed using the eBioscience FoxP3/Transcription Factor Staining Buffer Set (Thermo Fisher Scientific). All antibodies were purchased from BD Biosciences, BioLegend or eBioscience. Single-stain samples and fluorescence minus one (FMO) controls were used to establish PMT voltages, gating and compensation parameters. Cells were processed using a BD LSRFortessa or BD LSRFortessa X20 flow cytometer and analyzed using FlowJo software (Tree Star).

Computational multidimensional analyses of flow cytometry data

FCS files were loaded into python using flowIO. Compensation was performed using manually determined compensation values. Data was loaded into Scanpy59 for further processing. Permissive manual gating in python was performed using physical dimension (FSC-W, FSC-A, SSC-A) on manual inspection and dead cells were filtered out based on live/dead staining. Fluorescent channels were normalized to a range of [0, 1]. Clustering was performed using PARC71 with a resolution_parameter = 1.5, keep_all_local_dist=False and jac_std_global = 0.15. This yielded 34 clusters. Clusters were merged based on manual inspections of all fluorescent channels and merging was not performed if at least one fluorescent channel was differentially expressed between two clusters. We used PAGA72 initialization for UMAP embedding. PAGA was computed using 30 nearest neighbors in expression space using cosine distance. For UMAP embedding, we used the following parameters: n_neighbors = 30, metric=’euclidean’, min_dist = 0.3, init_pos=’paga’. For display of proportional changes in cluster frequency we divided the number of cells in each cluster by the total number of cells in the respective sample. We divided those values by the mean over the proportion in the respective cluster in the no drug sample and took the logarithm of this ratio to yield the log fold enrichment of the respective cluster. Seaborn was used for visualization. All computational gates were validated by manual inspection in FlowJo.

In vivo EdU labeling and calcineurin blockade in adult mice

Four- to eight-week-old male and female AND Rag1−/− mice (described above) were intraperitoneally injected with 2 mg EdU (Thermo Fisher Scientific, A10044) in the evening. The next morning (16 h later), mice were injected with 5 μg FK506 (Invitrogen, INH-FK5-5). Thymi were taken for flow cytometry 24 or 48 h after FK506 was administered. Thymi were dissociated and 2 × 106 cells were surface stained for flow cytometry as described above. After surface staining, cells were split, and 1 × 106 were processed using Click-iT EdU Pacific Blue Flow Cytometry Assay Kit (Thermo Fisher Scientific, C10418). The other 1 × 106 were subjected to anti-CD3 crosslinking and p-ERK staining as described below. Flow cytometry and data analysis were performed as described above.

In vitro TCR activation and staining for p-ERK

Approximately 1 × 106 surface stained thymocytes were washed and resuspended in approximately 240 μl serum-free media. Per sample, 10 μl anti-CD3e antibody (clone 145-2C11, Invitrogen 14-0031-85) was added to reach a final concentration of 20 mg ml−1. Working quickly, 7 μl of anti-Armenian hamster IgG crosslinker (Jackson ImmunoResearch Laboratories 127-0051-160) was added to each sample, briefly vortexed, and placed in a 37 °C water bath for 3 min for in vitro TCR activation. For fixation, 1:1 volume of 4% paraformaldehyde was added to each tube and incubated at room temperature for 10 min before being washed in PBS. Cells were resuspended in 900 μl ice-cold methanol by gentle pipetting, and incubated on ice for 30 min. After three washes in PBS, cells were incubated at 4 °C overnight p-ERK antibody (1:20 dilution, BioLegend, 675504). Samples were washed and resuspended for analysis. Flow cytometry and data analysis were performed as described above.

Statistical analyses

Data were analyzed using Prism software (GraphPad). Comparisons were performed using an unpaired t-test, one- or two-way analysis of variance, where indicated in the figure legends. For all statistical models and tests described above, the significance is displayed as follows: *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. Animals were randomly assigned to experimental or control groups. No statistical methods were used to pre-determine sample sizes but our sample sizes are similar to those reported in previous publications7,31. Data distribution was assumed to be normal, but this was not formally tested. No animals or data points were excluded from the analyses.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.