Dynamics of TCR repertoire and T cell function in COVID-19 convalescent individuals

SARS-CoV-2 outbreak has been declared by World Health Organization as a worldwide pandemic. However, there are many unknowns about the antigen-specific T-cell-mediated immune responses to SARS-CoV-2 infection. Here, we present both single-cell TCR-seq and RNA-seq to analyze the dynamics of TCR repertoire and immune metabolic functions of blood T cells collected from recently discharged COVID-19 patients. We found that while the diversity of TCR repertoire was increased in discharged patients, it returned to basal level ~1 week after becoming virus-free. The dynamics of T cell repertoire correlated with a profound shift of gene signatures from antiviral response to metabolism adaptation. We also demonstrated that the top expanded T cell clones (~10% of total T cells) display the key anti-viral features in CD8+ T cells, confirming a critical role of antigen-specific T cells in fighting against SARS-CoV-2. Our work provides a basis for further analysis of adaptive immunity in COVID-19 patients, and also has implications in developing a T-cell-based vaccine for SARS-CoV-2.


Introduction
The year 2019 ended with the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) 1 . This disease was officially named coronavirus disease 2019 (COVID-19) by the World Health Organization 2 . Shortly after, COVID-19 outbreak spread globally and became a pandemic disease 3,4 . The genome sequence of SARS-CoV-2 bears 79.5% identity to that of SARS-CoV 5 . Like SARS-CoV and MERS-CoV, SARS-CoV-2 belongs to beta genus Coronavirus in Corornaviridae family 6 .
The T-cell-mediated immune response is one of the primary defense mechanisms of adaptive immune system against virus 7 . T cells orchestrate adaptive immunity following the signalling dictated by their clonotypic T cell receptors (TCRs), which recognize a peptide (8)(9)(10)(11)(12)(13)(14)(15) amino acids) presented by major histocompatibility complex (MHC) 8 . Recognition of peptide-MHC complex (pMHC) by TCR induces activation and differentiation of naive T cells to various functional subsets during acute stages of infection and leads to the eradication of invading pathogens 9 .
Both chains of TCR (α and β) consist of a variable (V) region, junctional (J) region, and constant (C) region. The diversity (D) region connects V and J regions and forms an integral β chain 10 . Thus, TCR recombination process generates highly diverse complementarity-determining regions (CDRs) localized in the TCR α and β chains, forming a functional and highly diverse TCR repertoire. The CDR sequences determine the specificity of TCR binding to pMHCs, of which the third CDR (CDR3) is the most hypervariable and contributes to direct peptide recognition. During viral infection, CD8 + T cells recognize viral peptides and mediate killing of infected cells by releasing granzymes and perforin 9 .
Although considerable efforts have been spent on clarifying the immune response during SARS-CoV-2 infection 11,12 , little is known about the responses of antigenspecific T cells with their diverse TCR repertoire in human for targeting the virus. The immune system generates SARS-CoV-2-reactive T cells against the virus at the beginning of an acute respiratory distress syndrome in COVID-19 patients, and T cells are increased with time 13 . However, initial data from the clinical examinations of severe COVID-19 patients showed that both CD4 + and CD8 + T cells are decreased along with the deterioration of patients' status 14 . Evidence suggests that the SARS-CoV-2 specific T cells are found in both COVID-19 patients and unexposed healthy donors 15,16 . In addition, it was shown that T cells respond to nucleocapsid protein (NP) of SARS-CoV-2 in both the SARS-recovered patients and uninfected individuals, suggesting that the memory response could be induced by previous infections of "common cold" human coronaviruses 16,17 . Moreover, the T cells from SARS-CoV-2 patients recognize multiple domains of NP, and the long-lasting memory T cells from 2003 SARS patients could respond to SARS-CoV-2 NP 18 . Thus, the anti-SARS-CoV-2 T-cell-mediated immune response could be crucial for the immune memory against SARS-CoV-2 infection. Nowadays, data are lacking regarding the dynamics of the generated TCR clones and the associated functional changes in these T cells, which are major players in the anti-viral immune processes. Here, we collected blood samples from COVID-19 patients who have recently become virus-free at different time points (Discharged vs. Follow-up). We performed a comprehensive single-cell analysis to examine both the TCR sequence and functional gene expression in these COVID-19 convalescent subjects.

Global profile of CDR3 length and TCR V/J distribution
To reveal the TCR repertoire dynamic during immune responses to SARS-CoV-2, we assessed 10 recovered patients, in which all the patients were positive to SARS-CoV-2 nucleic acid test during hospitalization and had negative results in a SARS-CoV-2 nucleic acid test on discharge day. For the Discharged group, six of them were sampled 6 days before and 1 day after the discharge day. We further divided the Follow-up group into two subgroups. In particular, three patients were sampled on 7 days post-discharge (Early Follow-up), and four patients were samples between 19 and 40 days post-discharge (Late Follow-up), in which three patients were re-sampled from the Discharged group (Fig. 1, Supplementary  Table S1). Moreover, peripheral blood mononuclear cells (PBMCs) from six healthy donors with the negative nucleic acid test were collected for control (HD). We isolated PBMCs to construct cDNA libraries and performed TCR-α/-β transcriptional sequencing analysis using 10X Genomic Single-cell-based platform (Fig. 1).
We first assessed CDR3 length distribution across the TCR α and β CDR3 sequences from the HD, Discharged, and Follow-up groups to map the T cell repertoire diversity. The CDR3 length of the HD group floated between 5 and 28 aa with a peak of 19.76% cells at 13 aa of TCR α chain. The Discharged group displayed TCR α chain CDR3 length between 4 and 27 aa and a peak of 22.03% cells at 14 aa of TCR α chain. In the Follow-up group, the TCR α chain possessed 5 to 28 aa with the peak at 13 aa (21.62% cells) (Fig. 2a). The TCR β chain CDR3 length was between 5 and 26 aa in the HD group, 5 and 27 aa in the Follow-up group, and 5 and 28 aa in the Discharged groups. The TCR β chain CDR3 length of the three groups was similar with 24.71% cells from the HD group, 23.99% cells from the Discharged group, and 25.21% cells from the Follow-up group at the peak of 15 aa (Fig. 2b).
We then evaluated the usage of TCR V/J repertoires in healthy donors and recovered patients. Our data showed a high level of TRAV1-2 and TRAJ23 expression in the HD group. Both recovered groups exhibited increased expression of TRAV1-2 and TRAJ33 in the TCR α chain construction (Fig. 2c). Remarkably, TRBV20-1 was the top used TRBV gene in recovered patients, with the same high expression pattern of TRBJ2-7 (Fig. 2d, Table 1). While the top used TRBV and TRBJ were TRBV9/TRBV20-1 and TRBJ2-1/TRBJ2-7 in the HD group. Taken together, our result showed a similar CDR3 length and TCRV/J usage frequency in the groups of healthy donors and recovered patients.
Discharged group displayed a higher diversity of TCR repertoire and a lower level of TCR clonal expansion In order to further understand the dynamics of T cell repertoire post-SARS-CoV-2 infection, we analyzed the feature of T cell clonal expansion in the three groups. The percentage of TCR classification count (the number of different TCR sequences in a sample) to total cell number was used to measure the TCR diversity. This percentage in the representative sample of the Discharged group (#2) was much higher (94.51%) than the other two examples (53.75% in HD#3 and 61.21% in Follow-up#3) (Supplementary Fig. S1a). Moreover, the decreased proportion of high expanded clone (≥100), moderately expanded clone , small expanded clone (5-9), rarely expanded clone (2)(3)(4), and increased non-expanded clone (unique) fraction revealed higher clonotype diversity in the Discharged group (Fig. 3a, Supplementary Fig. S1b). In addition, we found a majority (82.95%) of non-expanded unique T cells from the Discharged group compared to the HD and Follow-up groups.
We next calculated the TCR diversity in all the three groups. We found that the Discharged group showed a higher level of TCR diversity at 83.80%, whereas this percentage was similar in both the HD group at 69.83% and the Follow-up group at 66.33% (Fig. 3b). Consistently, we found a significantly higher population of expanded T cells in the HD (32.87%) and the Follow-up (35.98%) groups compared with 17.05% in the Discharged group (Supplementary Table S5). We then calculated the cell number ratio in each TCR clonotype to total cell number from the representative healthy donor and recovered patient groups. The proportion of the top 10 TCR clonotypes in HD#3 and Follow-up#3 exhibited comparable tendency, whereas the Discharged#2 kept a persistently lower ratio with its top 10 TCR clonotypes (Fig. 3c). Consistently, a lower proportion of top 10 TCRs in total TCR repertoire was found in the Discharged group The representative samples showed similar high expression of TRAV12, TRAJ23, TRAJ33, TRBV20-1, TRBJ2-1, and TRBJ27 in the top 10 TCR clonotypes highest used V/J genes in the three groups (Tables 2-4). A dramatic decrease in the frequency of top 10 clones was observed in the Discharged group compared to the HD and Follow-up groups. We next used a global V-J pairing distribution analysis to display the TCR features from three groups ( Supplementary Fig. S2). Several dominant  connections were observed in HD and Follow-up groups, including the TRAV1-2, TRAJ16, TRBV20-1, and TRBJ2-1 in HD group, TRAV1-2 TRAJ33, TRBV20-1, and TRBJ27 in the Follow-up group ( Supplementary Fig. S2a, c). The Discharged group had a relatively uniform connection between different V-J combinations, consistent with an increased level of TCR diversity in this group ( Supplementary Fig. S2b).

Differentially expressed genes related to immune functions in recovered patients
We next focused on the expanded top T cell clones, representing the most active T cells responding against SARS-CoV-2 infection. We linked scRNA-seq and paired scTCR-seq data from the HD group (17,374 cells), the Discharged group (24,040 cells) and the Follow-up group (24,938 Table S4). In each group, the cell number of the top 20 TCR clonotypes contributed 10% to 20% of total cell number (Supplementary Table S4). We first explored CD3E, CD4, and CD8A expression in the TCR paired scRNA-seq data to identify CD3E + CD8A + CD4 − populations. The integrated data from HD vs. Discharged vs. Follow-up groups showed a comparable cell number of CD3E + CD8A + CD4 − cluster with that of CD3E + CD8A − CD4 + cluster ( Supplementary  Fig. S4). Whereas, the CD3E + CD8A + CD4 − clusters contributed substantially to all of top 20 TCR data in three groups' integrated data ( Supplementary Fig. S5, Table S4), which indicated expanded CD8 + T cell clones contributing to immune response to viral infection.

cells) (Supplementary
By comparing the differentially expressed genes from top 20 TCR clonotypes within CD3E + CD8A + CD4 − clusters in healthy donors and recovered patients, we found that both the Discharged and Follow-up groups expressed an increased fold change of IFN and granzymerelated genes expression compared with healthy donors (Fig. 4a, b). In comparison with the HD group, four IFNrelated genes (IFITM1, IFITM2, IFITM3, and IFI6) and IFN activation downstream related genes (ISG15 and TRIM22) increased in the Discharged group, indicating an enhanced anti-viral immunity in the Discharged group (Fig. 4a). Moreover, the highly expressed cytotoxic genes (GZMK and GNLY) also demonstrated that the T cell clones from the Discharged group possessed a powerful killing function against the SARS-CoV-2 infected cells. In Follow-up vs. HD analysis, the Follow-up group exhibited similar highly expressed IFN-related gene IFITM3 and granzyme/perforin-related genes (GZMB, GZMK, and GNLY) (Fig. 4b). Thus, the anti-viral immunity and the cytotoxicity persisted at a high level in the recovered patients at both Discharged and Follow-up time points.
Interestingly, we found the key IFN regulator gene IRF1 downregulated in both Follow-up and Discharged groups, which may contribute to the restriction of viral replication in the T cells of SARS-Cov-2 infected patients (Fig. 4a, b). Furthermore, with a direct comparison between the Discharged group and the Follow-up group, we found a significantly higher expression level of CD8 + T cell antiviral functional genes in the Discharged group, most of which were the IFN-related genes (IFITM1, IFITM2, IFI6, TRIM22, and ISG15) (Fig. 4c). Previous studies indicated that IFITM proteins were employed in the SARS infection for restricting virus fusion 19,20 . The recent reported LY6E, which impairs the coronavirus fusion and inhibits the SARS-CoV-2 infection 21 , was also highly expressed in the Discharged group. In addition, a mixed expression pattern of cytotoxic genes was observed, i.e., GZMA and GZMH were highly expressed in the Discharged group, whereas GZMB and GNLY showed an increased expression in the Follow-up group (Fig. 4d).
Overall, the highly expressed IFITM family and LY6E genes in the Discharged group implied a better defense against SARS-CoV-2 infection by the resistance of target cells to viral fusion than other two groups. Both Discharged and Follow-up groups maintained a powerful viral elimination competence with a strong killing function. These data suggested, along with the recovery from SARS-CoV-2 infection, that the anti-viral fusion potency would revert to the basal level quickly, while the killing features of CD8 + T cells persist for a longer time.

IFN-related anti-viral features in the discharged patients
The features of immune genes in CD8 + T cells from top 20 clonal T cells indicated their pivotal effector function in the anti-viral immune responses. We next illustrated the significant properties of these expanded CD8 + T subsets in COVID-19 recovered patients by Gene Set Enrichment Analysis (GSEA) with expressed genes in the Discharge group vs. the HD group. Via gene expression in each cluster, we identified the subsets of top 20 clonal T cells (Supplementary Figs. S7-S9). Interestingly, the Discharged and Follow-up groups contributed the majority of CD8A + KLRB1 hi CXCR4 hi terminal differentiation T subset and CD8A + CD160 hi T effector subset (Fig. 5a), in which the Discharged group revealed anti-viral hallmarks by GSEA contrast to HD group (Fig. 5b). Furthermore, we analyzed biological process (BP) based on Gene Ontology (GO) database 22,23 of differentially expressed genes of CD8 + T subsets in top 20 clonal T cells. Comparing with the HD Table 1 The highest TRAV/J usage in each group .   Pt#  TRAV  TRAJ  TRBV  TRBJ   HD  TRAV1-2  TRAJ23  TRBV9  TRBJ2-1   Discharged  TRAV1-2  TRAJ33  TRBV20-1  TRBJ2-7   Follow-up  TRAV1-2  TRAJ33  TRBV20-1  TRBJ2-7 group, the patients displayed anti-viral immune response BPs (Fig. 5c, d). Particularly in the Discharged group, most of the top 5 BPs (3/5) directly linked to interferon-related anti-viral immune response (Fig. 5c). Moreover, the gene concept network analysis of differentially expressed genes also supported the pathway analysis of Discharged group (Supplementary Fig. S10). Consistently, the expressed genes of total CD8 + T cells in the Discharged group displayed exactly the same two interferon-related hallmarks compared to the HD group based on GSEA ( Supplementary Fig. S11a). Moreover, total CD8 + T cells also exhibited 8 anti-viral BPs out of the top 10 BPs (Supplementary Fig. S11b).
Taken together, the GSEA and BP results of CD8 + T cells from top 20 clones were remarkably similar to the analysis results from total CD8 + T cell population, although top 20 TCR clonotypes contributed <20% of the total T cells. These data indicate top 20 T cell clones were the active populations representative of IFN-dependent anti-viral functional features at the discharge stage.

Alteration of metabolism in the Follow-up patients
Top 20 clones of CD8 + T cells in the Follow-up group displayed 2 significantly enriched gene sets about metabolism comparing with the HD group, and none of the viral infection-related gene sets was identified with GESA ( Fig. 6a). The GSEA result based on total CD8 + T cell indicated the same metabolic features in the Follow-up group ( Supplementary Fig. S12).
We next divided the Follow-up group into Early and Late subgroups based on their sampling time points (Fig. 1) (Supplementary Table S1). By direct comparison with the HD group in top 20 clones of CD8 + T cells, the Early Follow-up group displayed strong metabolism features: all of the top 3 BPs of differentially expressed genes were related to metabolism ( Supplementary Fig. S13). While in contrast to the HD group, the Late Follow-up group showed fatty acid metabolism feature specifically by GSEA (Fig. 6b). Previous studies indicated that the memory T cells increased fatty acid oxidation with more spare respiratory capacity for rapidly recall upon challenge 24,25 . In the top20   Table 3 Top 10 TCR clonotypes information of representative sample from Discharged group (Discharged#2). clones' data, we have identified two memory T subsets: central memory T subset and effector memory T subset ( Supplementary Fig. S7). We focused on analyzing CD8A + GZMK + effector memory T subset, of which cells could rapidly differentiate into effector cells against the infection 26 . We found that main constituent of this effector memory T subset was from the Follow-up group (Fig. 6c). The transcription factor analysis indicated that the Late Follow-up group displayed specifically active transcription factor SP1, linked to lipid metabolism and cholesterol biosynthesis [27][28][29] , in CD8A + GZMK + effector memory T subset ( Supplementary Fig. S14).
Via Monocle analysis of paired Discharged and Late Follow-up samples in top 20 TCRs paired scRNA-seq data, we found that the Late Follow-up group contributed a larger proportion in CD8A + GZMK + effector memory T cell population, and distributed along the terminal-exhausted differentiation. While the Discharged group located relatively close to the beginning of trajectories (Fig. 6d). Moreover, the Monocle analysis showed that the Late Follow-up group contributed all cells in CD8A + CXCR5 + central memory T subset (Fig. 6e). These data demonstrated that the Follow-up group in top 20 CD8 + T cell clones displayed multiple metabolic features. In particular, the Late Followup group exhibited remarkable memory T cells metabolic features, coinciding with CD8 + T cells differentiation during anti-viral immune response.
Overall, we assumed that the patients possessed a higher proficiency against SARS-CoV-2 infection during recovery stages, and vital process was changed from antiviral immune response to metabolism adaptation after discharge.

Discussion
This study used a combination of scTCR-seq and scRNA-seq analysis to measure both TCR clone dynamics and functional gene expression in COVID-19 convalescent individuals. Three key findings are shown from our systemic analysis: (1) TCR repertoire diversity decreased quickly after recovering from COVID-19, without changing the global frequency of VDJ gene usage; (2) The dynamics of TCR repertoire was linked to a profound change of gene signatures from anti-viral response to metabolism adaptation; (3) The top expanded T cell clones (~10% of total T cells) determined the principal features of CD8 + T cells, indicating the vital role of antigen-specific T cells in fighting against SARS-CoV-2 infection.
Our study demonstrated exciting features of T cell responses during COVID-19 recovery process. A previous study provided evidence that SARS-CoV-2 infection resulted in T cell reduction due to the functional impairment of dendritic cells, and these weakened/ delayed CD8 + T cell responses could contribute to acute COVID-19 pathogenesis 30 . Our data indicated a low level Table 4 Top 10 TCR clonotypes information of representative sample from Follow-up group (Follow-up#3). of clonal expansion CD8 + T cells, with a strong anti-viral function (both defending and killing effects), which could serve as the major players in combating the virus before or during discharge time point (Fig. 7b). Our findings, at least partially, solved a conundrum: How a dramatically decreased level of CD8 + T cell can clear the virus in most SARS-CoV-2 infected individuals.
The granzyme favors the cytolysis of viral infected cells with different properties 31,32 . GZMA, GZMB, and GZMK act as pro-inflammatory factors [33][34][35] , of which GZMB is the most powerful pro-apoptotic granzyme 36 , and GNLY encodes granulysin protein binding to cells via electric charge to lysis target cells 37 . Furthermore, the studies in mice suggested that GZMA and GZMB possess distinct cytotoxic sensitivity to different target cells via perforinmediated apoptosis 36,38 . The Discharged group showed high expression of GZMA (Fig. 4d), which had relatively low cytotoxicity 36 . While GZMB, a greater cytotoxic granzyme 36 , was highly expressed in the Follow-up group. The cytotoxic genes expression indicated that viral clearance was not the end of the immune response, and afterward, immune system preserved cytotoxicity in a period with different characteristics. Based on the distinct expression of granzymes in different groups, we assume that: immune system eliminates infected cells and viruses via a cytotoxic mediator network and is a manifold system; the cytotoxic function is enlarged with pro-apoptotic factors such as GZMB and GNLY over time; in late-stage afterward viral clearance, immune system maintains proapoptotic features, which could efficiently avoid repeat infection in the short term.
IRF1 is a key regulator that contributes to anti-viral response. Previous studies indicated that IRF1 polymorphisms (down-regulation) were associated with RNA virus HIV-1, infection resistance 39 and modulated by HIV-1 to enhance viral replication and penetrated immune system defense at least at the initial stage 40 . Our data revealed that during SARS-CoV-2 infection, a singlestrand RNA virus, CD8 + T cells exhibited low IRF1 and high IFITM levels in the Discharged and Follow-up groups. Thus, we hypothesized that immune system probably used a similar mode against HIV-1 infection by down-regulated IRF1 and up-regulated IFITM to avert the direct viral infection in CD8 + T cells.
In addition, we found 1-6 weeks after discharge (followup time point), the clonal expansion increased, and the diversity of TCR repertoire decreased to basal level, comparable with healthy donors, which were not exposed to SARS-CoV-2. Moreover, the clonally expanded CD8 + T cells in the Follow-up group exhibited features of active metabolic reprogramming, supporting a post-infection recovery for the T cell system (Fig. 7c). In addition, the Early Follow-up group showed the ATP metabolism feature in biological process analysis, which corresponded to the substantial spare respiratory capacity of long-lived CD8 memory T cells after clearance of infection to produce sufficient ATP and promote CD8 memory T cells survival 41,42 . Previous studies suggested that upregulated carnitine palmitoyl transferase-I (the first step in fatty acid oxidation) in CD8 memory T cells increased fatty acid oxidation and mitochondrial respiratory capacity 24 . Our results indicated that in the late Follow-up group, fatty acid metabolism increased significantly. Therefore, immune system could quickly respond to viral challenge.
Overall, in the Healthy group, the minor clones were immune surveillance cells and maintained homeostasis of immune response (Fig. 7a); in the Discharged group, most low expanded minor clones were defensing against virus infection (IFN-related) and cytotoxic (granzyme) clones (Fig. 7b); and the Follow-up group possessed GZMBcytotoxic clones and the major metabolic clones (Fig. 7c). Our data indicated a low level of clonal expanded CD8 + T cells, with a strong anti-viral function, which could serve as the pivotal factor during discharged time point. The clonal expansion increased, and the diversity of TCR repertoire decreased to basal level as the healthy donors within 6 weeks after discharge. Furthermore, the clonally expanded CD8 + T cells in the Follow-up group exhibited features of active metabolic reprogramming, revealing a potential postinfection recovery for T cell immune system.
We believe that investigating the dynamics of TCR repertoire could provide a valuable tool to monitor T-cellmediated immune responses to potential SARS-CoV-2 vaccines. This area of research will need to be expanded using a larger cohort of patients, which could be sampled at different time points to provide more detailed information in near future for designing the best way to increase the number of T cells and enhance their (see figure on previous page) Fig. 5 GSEA and biological process analysis for CD3E + CD8A + CD4 − clusters of top 20 TCR clonotypes paired scRNA-seq data from Patient group vs. HD group. a Distribution of each group in CD8A + KLRB1 + CXCR4 + terminal differentiation T subset (left panel) and CD8A + CD160 hi effector T subset (right panel). b Using GSEA to analyze expressed genes from CD3E + CD8A + CD4 − clusters of top 20 TCR paired scRNA-seq data, 2 gene sets of interferon response significantly upregulated in Discharged group comparing with HD group (n = 6 per group). NES, normalized ES; FDR, false discovery rate; NOM p, normalized p value. Top 20 BP enrichment analysis of DEGs from CD3E + CD8A + CD4 − clusters of top 20 TCR paired scRNA-seq data, which were upregulated in Discharged group vs. HD group analysis (n = 6 per group) (c) and Follow-up vs. HD groups (n = 7 in Follow-up group, n = 6 in HD group) (d). DC, Discharged group; FU, Follow-up group; HD, Healthy Donor group.
anti-viral function. Moreover, our presented study could form the basis for developing a novel T cell vaccine to combat SARS-CoV-2 infection. Experimental model and subject details Samples collection

Materials and methods
The peripheral blood samples were collected from 10 patients and 6 Healthy Donors (HD). Six patients, sampling on the discharged day or within 1 week before discharge, were identified as Discharged group (DC). The samples of the Follow-up group (FU) were collected during 1 week to 6 weeks after discharge. The Follow-up group was divided into the Early Follow-up group (EFU) and the Late Follow-up group (LFU) according to the days from discharge. Briefly, 3 patients of the Early Follow-up group were sampled on 7 days post-discharge. The Late Follow-up group contained 4 patients, in which 3 patients from the Discharged group were re-sampled between 19 and 40 days post-discharge and 1 patient was sampled on 30 days post-discharge (Fig. 1a). Detailed patient and grouping information were provided in Supplementary  Table S1. This study was under the tenets of the Declaration of Helsinki and was approved by the Medical Ethical Committee at Shanghai Jiao Tong University School of Medicine, China.

Method details PBMC isolation, single-cell RNA sequencing, and analysis
For the single-cell RNA and TCR sequencing, isolated PBMCs were prepared by the standard density gradient centrifugation with Ficoll-Paque (GE Healthcare, Uppsala, Sweden). Prepared PBMCs were used to construct scRNA-seq libraries. The Chromium Single-Cell 5' v3 Reagent Kit (10X Genomics, Pleasanton, USA) was used as per the manufacturer's instructions. The Cell Ranger software (version 3.1.0) was used to process the single-cell expression data and align data to the GRCH38 reference genome, of which the corresponding vdj package was used for TCR v/d/j analysis (Supplementary Table S2). The gene expression matrixes of samples were converted to (see figure on previous page) Fig. 6 GSEA for CD3E + CD8A + CD4 − clusters of top 20 TCR clonotypes paired scRNA-seq data from Follow-up group vs. HD group and pseudo-time analysis of paired samples from Discharged and Late Follow-up groups. a Using GSEA to analyze expressed genes from CD3E + CD8A + CD4 − clusters of top 20 TCR paired scRNA-seq data, 2 metabolism-related gene sets significantly upregulated in Follow-up group comparing with HD group (n = 7 in Follow-up group, n = 6 in HD group), and b a fatty acid metabolic gene set significantly upregulated in Late Follow-up group vs. HD group analysis (n = 4 in Late Follow-up group, n = 6 in HD group). NES, normalized ES; FDR, false discovery rate; NOM p, normalized p value. c Distribution of each group in CD8A + GZMK + T effector memory subset.  Seurat objects via Seurat (v3.2.0, R package) 43 in R (v3.6.0). According to samples' variation, the following criteria were applied to each sample: mitochondrial gene percentage <5%; gene number more than 200 and less than 2200 (for #DC3), 2500 (for #HD1/2/3), 3000 (for #DC2/4/5/6 and samples from Follow-up group), 4000 (for #DC1 and #HD4/5), and 4500 (for #HD6). Furthermore, the cells, expressing all of TCR α chains, TCR β chains and transcriptional RNA data, were kept.  Tables S3 and S4). The gene expression matrixes were normalized and calculated with the top 4000 variable genes by 'NormalizeData' and 'FindVariableFeatures' functions in the Seurat package. The gene expression matrixes were integrated together using 'FindInte-grationAnchors' and 'IntegrateData' functions with the 50 dimensions from CCA to correct the batch effect from samples ( Supplementary Fig. S3). The integrated data were saved for following analysis. We scaled the integrated data with default parameters in 'ScaleData' function. The 50 principal components were used in the principal component analysis and followed by UMAP. We applied top 20 principal components for 'FindNeighbors' and the following 'FindClusters' functions for clustering. Moreover, data were saved for the following differentially expressed genes (DEGs) analysis. To visualizing CD3E, CD4, and CD8A genes expression, 'FeaturePlot', 'VlnPlot', and 'CombinePlot' functions were performed. CD4 + and CD8 + clusters were identified by CD3E, CD4, and CD8A genes expression. CD8 + clusters from integrated data were sorted with CD3E + CD4 − CD8A + genes expression via the 'subset' function in Seurat and saved for following GSEA analysis.
Top 20 TCR clonotypes CD3E + CD4 − CD8A + cluster analysis Using Cell Ranger vdj package, the V/D/J and CDR3 sequences of TCR α and β chains were paired by the barcode and ranked the TCR clonotypes by the abundance. The cells, which expressed top 20 TCR clonotypes in each sample, were analyzed with the same approach of Seurat. Briefly, the cells expressed top 20 TCR clonotypes in previous created integrated data were selected and applied 'ScaleData', 'RunPCA', 'RunUMAP', 'FindNeighbors', and 'FindClusters' functions with the same parameters as mentioned before. The data were saved for subsequent differentially expressed genes analysis and gene set enrichment analysis. Moreover, we applied 'FeaturePlot', 'VlnPlot', 'CombinePlot', and 'Dot-Plot' functions for identifying CD4 + and CD8 + clusters by gene markers' expression. CD8 + clusters from top 20 TCR clonotypes paired integrated data were sorted with CD3E + CD4 − CD8A + genes expression via the 'subset' function in Seurat. Then, the functional gene expression in CD8 + T cell clusters was visualized by 'FeaturePlot' function in Seurat. In CD8 + clusters from integrated data, CD8 + T cell functional genes were performed using 'DotPlot' function with 'dot.scale' as 8.

Cell type annotation
The corresponding cell types of the clusters were annotated manually in accordance with known gene markers ( Supplementary Figs. S6, S9) 44,45 .

Differential expression analysis and visualization
The differentially expressed genes in CD4 + subset, and CD8 + subset were identified using 'FindMarkers' function with subset.ident parameter as CD3E + CD4 + CD8A − and CD3E + CD4 − CD8A + clusters, respectively, in abovementioned data. Moreover, we compared the Discharged group to the HD group, the Follow-up group to the HD group, and the Discharged group to the Follow-up group differential expressed genes in CD4 + and CD8 + subsets by ident.1 parameter. The differentially expressed genes were up-regulated and down-regulated according to the avg_logFC value >0 and <0. For visualizing the genes differential expression, we applied 'ggscatter' function in ggplot2 (v3.3.2, R package) with 'size' as 1 and true of 'repel', the default setting 'theme_base' function in ggthemes (v4.2.0, R package) and the default setting 'geom_hline' and 'geom_vline' functions in ggpubr (v0.4.0, R package).

Gene set enrichment analysis
The data from previous Seurat applications were employed in gene set enrichment analysis (GSEA). We added the group information in the data and got the gene expression matrix by 'GetAssayData' function in Seurat with 'slot' parameter as data. The gene expression matrix of each sample was sorted, and the mean value of every gene was calculated by 'rowMeans' function. The gene expression matrix file (".gct" format) and phenotype file (".cls" format) of samples were loaded into GSEA (v4.

Single-cell development trajectory reconstruction
The Monocle2 (v2.14.0, R package) was applied to construct single-cell trajectories to discover developmental transitions. The clustering top 20 TCR clonotypes scRNA-seq data from paired samples in Discharged, and Late Follow-up group were performed the pseudo-time analysis. Briefly, genes in which the count of cells expressing is greater than or equal to 50, were used to define cells' progress and clustering in 'differentialGeneTest' and 'setOrderingFolter' functions. 'DDRTree' was applied in the 'reduceDimension' function to reduce dimensions, and 'plot_cell_trajectory' function was used for visualization.

Transcription factor analysis
The top 20 clones' gene expression matrixes and annotation information of each group were uploaded to IRIS3 48 website (https://bmbl.bmi.osumc.edu/iris3/submit.php) for transcription factor analysis. The regulon analysis of each group was performed with Job ID 20210620233403 for late Follow-up group, Job ID 20210620212155 for early Followup group, Job ID 20210620190351 for Discharged group, Job ID 2021062104159 for the HD group.

TCR diversity calculation
The ratio of TCR classification count to total cell number in each sample was used to measure the clonotype diversity of samples. Briefly, we defined the number of different TCR sequences as the TCR classification count, and the percentage of the sample's clone diversity = (TCR classification count/total cell count) × 100%. Thus, the sample displayed a low ratio that means the sample possessed either decreased TCR classification count or increased total cell number, of which the TCR diversity is lower.

Statistical analysis
The statistical analysis was performed by Prism 7 software with unpaired student's t-test in the two-group analysis. *P values < 0.05, **P values < 0.01, ***P values < 0.001.