Quantitative and qualitative characterization of expanded CD4+ T cell clones in rheumatoid arthritis patients

Rheumatoid arthritis (RA) is an autoimmune destructive arthritis associated with CD4+ T cell-mediated immunity. Although expanded CD4+ T cell clones (ECs) has already been confirmed, the detailed characteristics of ECs have not been elucidated in RA. Using combination of a single-cell analysis and next-generation sequencing (NGS) in TCR repertoire analysis, we here revealed the detailed nature of ECs by examining peripheral blood (PB) from 5 RA patients and synovium from 1 RA patient. When we intensively investigated the single-cell transcriptome of the most expanded clones in memory CD4+ T cells (memory-mECs) in RA-PB, senescence-related transcripts were up-regulated, indicating circulating ECs were constantly stimulated. Tracking of the transcriptome shift within the same memory-mECs between PB and the synovium revealed the augmentations in senescence-related gene expression and the up-regulation of synovium-homing chemokine receptors in the synovium. Our in-depth characterization of ECs in RA successfully demonstrated the presence of the specific immunological selection pressure, which determines the phenotype of ECs. Moreover, transcriptome tracking added novel aspects to the underlying sequential immune processes. Our approach may provide new insights into the pathophysiology of RA.


Single-cell sorting
Memory CD4 + T cells in the PB of RA1 and RA2 and in the synovium of RA1 were single-cell sorted by FACS (MoFlo XDP, Beckman Coulter).

Single-cell whole transcriptome amplification
Single-cell whole transcriptome amplification was performed as an initial process for every sample in the single-cell analysis. We followed the original Kurimoto method, 1 except that the reverse transcription duration was increased from 5 minutes to 50 minutes because this modification increased the success rate of TCR CDR3 sequencing. Pre-amplified cDNA was used for the TCR repertoire and gene expression analysis. Briefly, each single cell was directly sorted into 5 μL lysis buffer in the PCR tube, followed by cell lysis and heat denaturing. Reverse transcription was performed by SuperScript III (Invitrogen) and oligo-dT primer (V1_dT_24), followed by the exonuclease I (Takara) digestion of unreacted primers. Poly-A was added by TdT (Invitrogen). Each single-cell cDNA was split into 4 tubes to minimize the stochastic effects of amplifying small samples by PCR, and whole transcriptome amplification was performed by 20 cycles of PCR with oligo-dT primers (V1_dT_24 and V3_dT_24). The PCR products of each single cell (in 4 tubes) were mixed and purified using the QIAquick PCR purification kit (Qiagen). All primer sequences in this method section were shown in Table S4.
Single-cell TCR repertoire analysis PCR strategy: We focused on the beta chains of TCR because they have generally a more strict allelic exclusion than alpha chains. We prepared forward primer sets (V_beta_common_2-24) to cover all of the functional TRBV as previously described 2 and also designed a reverse primer at TRBC (C_beta_332). We performed 50 cycles of multiplex PCR using pre-amplified single-cell cDNA as a template. PCR products were analyzed by electrophoresis on a 1.5% agarose gel. The fragment containing the target PCR product was cut out and purified using the QIAquick gel extraction kit (Qiagen).
Sequencing and mapping: Sequencing was performed by the Sanger method. Sequencing data was analyzed by Data analysis: The frequency of each clone was calculated by the number of cells with the same CDR3 sequence divided by the total number of sorted single cells. In the single-cell analysis, expanded CD4 + T cell clones (ECs) were defined by the clones detected more than once. The most expanded memory CD4 + T cell clones (mECs) were defined by the clones with the highest frequency in the single-cell analysis. Non-expanded CD4 + T cell clones (NEC) were defined by the clones detected only once.

Single-cell quantitative PCR
We used pre-amplified single-cell cDNA as a template. Due to the limitation of the cDNA sample volume, we divided single cells into several groups according to the TCR sequences. Every single-cell cDNA of each group was mixed and quantitative PCR (qPCR) was performed for each pooled sample. qPCR was performed in the CFX Connect Real-Time PCR Detection System (BioRad) by using QuantiTect SYBR Green PCR Kits (Qiagen). The expression of the target genes was calculated based on the 2 -delta Ct method, in which delta Ct = ( Ct of the target gene -Ct of the internal control gene) 4 . All gene specific primers were listed in Table S4.

Single-cell RNA-seq
Library preparation and sequencing: We performed 9 cycles of PCR with oligo-dT primers (T7-V1 and V3_dT_24) using pre-amplified single-cell cDNA as a template and PCR products longer than 200bp were cut out on a 2% agarose gel and purified using the QIAquick gel extraction kit (Qiagen) as described in the original Kurimoto method. 1 We required an additional 30 cycles of PCR with the same conditions in order to obtain a sufficient amount of the PCR product for the NGS analysis. After the final PCR products were fragmented using the Covaris S2 system, they were ligated with index and adaptor sequences for the Ion Torrent platform. Sequencing was performed on Ion PGM using 318 v2 chips (Applied Biosystems).
Quality control, mapping, and data analysis: Raw sequencing data was trimmed and mapped with default settings on the CLC Genomics Workbench v.7.5. 5 The Ensembl gene model (GRCh37.72) was used. Gene expression levels were calculated by reads per kilobase of the exon model per million mapped reads (RPKM) values 6 . RPKM values were log-transformed ( log2 (RPKM+1) ) and normalized by the quantile method. These normalized RPKM values were used as transcriptome data for every analysis of RNA-seq.

Gene set enrichment analysis:
Regarding the analysis of differentially expressed gene sets between 2 groups, a gene set enrichment analysis was performed as previously reported. 7 The number of permutations was set at 1000 and the false discovery rate (FDR) cut-off was set at 0.05.

II, NGS TCR repertoire analysis
Library preparation and sequencing: We also focused on the beta chain of TCR in the NGS TCR repertoire analysis.
We modified the 5'-RACE-based PCR strategy of the TCR region described previously. 8 Briefly, total RNA was purified using the RNeasy Micro kit (Qiagen) and reverse transcribed with SuperScript III (Invitrogen) and gene-specific primers of the TCR constant region (hTCR-CA-R2.2, hTCR-CB1-R3.2 and hTCR-CB2-R3). Poly-G was added on the 5' end of each cDNA sample and they were split into four tubes to minimize the stochastic effects of PCR. In the 1 st round of PCR, touch-down PCR was performed by the oligo-dC forward primer (Oligo dc-adaptor2) and reverse primers in the TCR constant region (hTCR-CA-R7 and hTCR-CB1-R9) using the same conditions as the original method, 8 except that the final number of amplification cycles was reduced to 16 cycles. After the 1 st PCR products of each sample (in 4 tubes) were mixed together, they were diluted 20-fold and used as a template in the subsequent PCR process. The 2 nd round of PCR was performed using a forward primer with an adaptor region (AP2) and reverse primer in the TCR constant region of the beta chain (TRBC_HTS_3), using the following conditions: 1) 96°C for 2 minutes, 2) 18 cycles of 96°C for 15 seconds, 63°C for 30 seconds, and 72°C for 1 minute, and 3) 72°C for 3 minutes. The 2 nd PCR products were purified using the QIAquick PCR purification kit (Qiagen). We used high-fidelity polymerase (PrimeStarGXL, Takara) in both rounds of PCR. After the final PCR products were fragmented by the Covaris S2 system, they were ligated with index and adaptor sequences for the Ion Torrent platform. Sequencing was performed on Ion PGM using 318 v2 chips (Applied Biosystems).

NGS TCR repertoire data analysis
5 Quality control and mapping: Quality control of raw sequencing data was performed on the CLC Genomics Workbench v.7.5. 5 After adaptor removal, further trimming was performed using the following settings: quality=0.01, no ambiguous nucleotide was allowed and reads below the length = 50bp were discarded. Mapping to the TCR region, the identification of TRBV and TRBJ genes, the extraction of CDR3 sequences, and error corrections were performed with default settings by MiTCR. 9 Functional CDR3 sequences were used for the TCR repertoire analysis. Fig. S1, we set the target total read count for each sample to be more than 100,000. However, the number of reads with functional CDR3 slightly varied between samples. In order to cancel the potential impacts of the read depth on the TCR repertoire analysis, functional CDR3 sequences were shuffled and an equal number of sequences (set as the minimum number within samples) were randomly extracted in each data set ( Fig. 2 and 3).

Frequency calculation:
The frequency of each clone was calculated by the number of reads of each clone divided by the total number of reads with functional CDR3 sequences. In the NGS analysis, ECs were defined by clones detected more frequently than 0.1% or 0.2% (Fig. 2). In order to assess the distribution of target clones within different subsets of CD4 + T cells (Fig. 3), frequencies assessed by the TCR repertoire analysis were multiplied by the frequency of each CD4 + T cell subset within all CD4 + T cells assessed by FACS in order to correct for the subset size.

S1)
The advantage of the NGS TCR repertoire analysis was the high throughput read and relative simpleness. However, it is limited by the high error rate and PCR bias during the library preparation process. We first examined the impact of error correction pipeline MiTCR 9 and estimated optimal read depths. We randomly re-sampled several pairs of datasets with the same number of reads from 2 NGS sequence data (RA1 and RA2), and calculated the consistency rate of CDR3 sequences between pairs with or without error corrections (consistency rate = total number of consistent clones between set 1 and set 2 / total number of unique clones in set 1 and set 2). The chances of detecting consistent reads between pairs generally become larger as the depth of the sequence became deeper. However, reads originating from sequencing errors increased linearly as the depth of the sequence became deeper and the majority of error-derived reads were counted as a "unique clone" by the definition due to the random nature of sequencing errors. Therefore, we observed that the consistency rate curve had a peak and decline phase without any error correction (Fig. S1 left).
When sequencing errors were corrected with default settings by MiTCR, 9 the consistent rate curve had an upward-sloping shape and finally became a plateau as the samples were exhaustively sequenced (Fig. S1 right).
These results verified the robustness of the error correction process and indicated 100,000 reads could cover the majority of clones in each sample. -cell analyses (Fig. S2) We performed a validation experiment to confirm the correctness of our NGS TCR repertoire analysis pipelines by FACS and single-cell TCR repertoire analyses with Sanger sequencing, neither of which were affected by PCR bias.

Validation of the NGS TCR repertoire analysis by FACS and single
Comparisons of TRBV usage data between FACS and NGS TCR repertoire analyses revealed a strong correlation ( Fig.   S2A and S2B). Evaluations of the frequency of each clone also showed a significant correlation between single-cell and NGS TCR repertoire analysis (Fig. S2C), although the correlation coefficient was smaller than that for TRBV usage data. We speculated that the wider dynamic range of the NGS TCR repertoire analysis for low-frequency clones hampered a strong correlation. In summary, we verified that our NGS TCR repertoire analysis pipeline had sufficient correctness.

7
Supplemental figure 1 Validity of error corrections and estimation of optimal read depths in the NGS TCR repertoire analysis One equal-sized pair of data were randomly re-sampled (each read count was shown in the X-axis) from whole sequence data obtained by the NGS TCR repertoire analysis of memory CD4 + T cells from RA1 and RA2. The consistency rate of the CDR3 sequence between pairs with or without error corrections was shown (consistency rate = total number of consistent clones between set 1 and set 2 / total number of unique clones in set 1 and set 2).  Tfh