Background & Summary

The immune repertoire is a collection of trans-membrane proteins that assist with the recognition of different antigens and initiate an immune response. These proteins, located on the surface of T cells and known as T cell receptors (TCRs), are made of several chains (alpha through delta) and while sharing a general molecular structure, each TCR is slightly different. This repertoire has the potential to generate more than 10151 different T cell receptors in humans. Once activated, the immune system sees major changes in its T cell composition; some T cells undergo clonal expansion2 and proliferation3, which later subsides. These dynamics govern the outcome of the immune response, and are key to understanding both the disease itself and its effective treatment4. Fig. 1 is a schematic view of the immune response in the face of infection, which also highlights the temporal nature of the response, rather than a binary state.

Fig. 1
figure 1

Stages of Adaptive Immune Response. This schematic view demonstrates the stages of infection (numbered), the level of pathogen presence in the host body, and the involvement of the adaptive immune response at each stage (black line).

The immune system must face a varied list of potential threats; be it viruses, bacteria, or the body’s own malfunctioning cells. Recruitment and activation of immune cells is needed. For this to happen, the immune system must be diverse enough to interact with different antigens. This is achieved with three cellular and molecular mechanisms: 1) the recombination of the T cell alpha genes and beta genes by the RAG1–2 protein complex during T cell development, 2) the stochastic nature of the formation of the third complementarity-determining region (CDR3) of the T cell, and 3) the combination of alpha and beta chains. This entire process is termed V(D)J recombination5,6 (Fig. 2) and is key to the development of the adaptive immune system, and the correct formation of the rearranged receptors is critical to their association with  antigens.

Fig. 2
figure 2

Schematic view describing the formation of alpha (left side) and beta (right side) chains of the TCR. These processes are collectively termed V(D)J recombination.

Immune checkpoint inhibitors (“ICI”)7,8,9 are used as treatment for several types of cancer. This kind of treatment will suppress natural checkpoint and regulators of the immune system, resulting in an amplified immune response, one that is more likely to deal with cancerous cells. This type of treatment has recently been employed in colorectal cancer (“CRC”), with several positive results reported, specifically in metastatic CRC that is mismatch-repair-deficient and microsatellite instability-high (dMMR–MSI-H)10. The fundamental involvement of the immune response in this type of cancer makes ICI and its effect on the T cell repertoire especially important.

CTLA4 is a negative regulator of the immune response11,12, aimed at preventing hyperactivation and adjusting the intensity of the immune response by causing clonal anergy of lymphocytes. CTLA4 operates on several cellular mechanisms, such as CD28 antagonism in binding to B7 ligands and other co-stimulatory ligands, recruiting T cell inhibitory effectors, and preventing immune stimulating antibody conjugate3. Anti-CTLA4 will block all of the above, and mediate a stronger immune attack13.

Unlike the genetic code, the TCR repertoire is dynamic; its state, made readable with RNA sequencing, indicates the current status of the immune system. As such, several methods have been developed to quantify the TCR repertoire and its diverse components14,15,16, and many computational and analytical tools are constantly being developed and used to analyze these repertoires17,18,19. Many works have highlighted the importance of a diverse TCR repertoire for the immune system to perform optimally1,20, and while these works track and document changes to the repertoire, the supporting data is limited to pre- and post- treatment time points at best. Additionally, since a phenotypical change, such as reduction in tumor size, can occur some time after the peak of the immunological response, it can be difficult to create a snapshot of the immune state at the most relevant time. Moreover, the high variability in the favorable reaction of cancers to immunotherapy, makes it even challenging for researcher to quantify the true dynamics of the immune response21.

In light of the above, we designed and carried out an experiment aimed at creating a temporal profile of the dynamics of the immune response as it occurs. These data, made available here, would allow researchers from all backgrounds to observe the changes over time as they develop and eventually subside in response to treatment with anti-CTLA-4. Analyzing the behavior of the immune response, presented here in several forms, from raw reads to clonotype listings, would benefit efforts to better understand the immune response in cancer, and hopefully lead to better treatments.


Experimental procedure

In this experiment we used the mouse tumor cell line MC38, a grade III mouse colon adenocarcinoma, induce in a syngeneic mouse model. An estimated 5·105 MC38 cells were subcutaneously implanted to immunocompetent C57BL/6 N mice of the C57Bl/6NAnNCrl strain. The untreated group (5 mice) was dosed with 10 ml/kg/day of vehicle on day 1, 3 and 6 of the experiment. The treated group (15 mice) was dosed with the immune checkpoint inhibitor antibody anti-mCTLA4 (5/2.5 mg/kg/day at 1/3, 6). Following treatment, blood samples for the isolation of PBMCs were taken at days 0, 7, 14, and 21. The experimental timeline is shown in Fig. 3.

Fig. 3
figure 3

Experimental timeline. Treated mice (colored red) underwent injection of anti-CTLA-4 clone 9H10 on days 1, 3, and 6. Control mice (colored green) were injected with vehicle at similar times. Blood samples were taken on days 0, 7, 14 and 21. Tumor (when present) and spleen samples were collected at animal sacrifice at the end of the experiment. Numbers indicate the number of mice viable for sampling.

Animal health and ethics

All experiments and protocols were approved by the animal welfare body at CR Discovery Research Services Germany and the local authorities, and were conducted according to all applicable international, national and local laws and guidelines. Only animals with unobjectionable health were selected to enter testing procedures. The animals were delivered at the age of four to six weeks and were used for experiments after at least one week of acclimatization. Animals were arbitrarily numbered during tumor implantation using radio frequency identification transponders. Each cage was labeled with a record card indicating all relevant experimental details.

Animal housing conditions

Animals were housed in individually ventilated type II and III cages, based on group size, and were kept under a 14 L:10D artificial light cycle. The temperature inside the cages was maintained at 25 °C with a relative humidity of 40–70% and 60–65 air changes/hour in the cage. Dust-free bedding consisting of aspen wood chips with approximate dimensions of 5 × 5 × 1 mm and additional nesting material was used. The cages, bedding, and the nesting material were changed weekly. The animals were fed autoclaved Teklad Global 19% Protein Extruded Diet (T.2019S.12) from Envigo RMS SARL and had access to sterile filtered and acidified (pH 2.5) tap water that was changed per week. Feed and water were provided ad libitum. All materials were autoclaved prior to use.

Immunotherapy treatment

A dosing solution with a concentration of 0.5 mg/ml for dosing at 5 mg/kg/day was freshly prepared on dosing day 1 by diluting 203 μL of anti-mCTLA4 stock solution with 2797 μL of PBS. A dosing solution with a concentration of 0.25 mg/ml for dosing at 2.5 mg/kg/day was freshly prepared on dosing day 3 and 6 by diluting 101 μL of anti-mCTLA4 with 2899 μL of PBS. All dosing solutions were administered in a dose volume of 10 ml/kg.

PBMCs purification from blood samples

100μL of blood was collected by retro-orbital sinus puncture under isoflurane anesthesia. PBMCs were prepared from 100 μL blood by spinning down cells and lysing pellets with ACK lysis buffer (150 mM ammonium chloride, 10 mM potassium bicarbonate, 0.1 mM EDTA, pH 7.2–7.4) to remove erythrocytes. The resulting PBMC were frozen in 10% DMSO/90% FCS at 80 °C until shipment.

Solid sample collection

Tumors (where still visible) were collected immediately after euthanasia, and directly transferred to liquid nitrogen (snap-frozen samples). Spleens were collected immediately after euthanasia, and directly transferred to liquid nitrogen (snap-frozen samples).

RNA purification from blood, tumor and spleen samples

PBMCs samples were thawed, and their total RNA content extracted using the RNeasy mini kit (Qiagen). Tumor and spleen samples were thawed, weighted and applied to Trizol (Invitrogen) purification procedure according to the manufacturer protocol. The obtained RNA product was then evaluated for its integrity value (RIN) to better standardize RNA quality. Next, RNA concentrations were determined by qubit, which provides an accurate method for the quantitation of low abundance RNA samples.

Library preparation

A fixed total RNA concentration of 250 ng from each sample was subjected to the SMARTer TCR profiling kit (Takara); a single cDNA strand is synthesized at the first round of reverse transcription, using poly T primer for the 3’ end and a smarter oligo (Switching Mechanism at 5’ End of RNA Template) for the 5’ ends, catalyzed by MMLV-derived SMARTScibe Reverse Transcriptase. By employing 5’ RACE-like approach, the SMARTer TCR profiling kit allows capturing complete V(D)J variable regions of TCR alpha and beta chains and in the following step, each fragment was exponentially increased by PCR. On each dsDNA fragment, dedicated adapters containing flow cell-binding sites, P5 and P7, were inserted to allow the enriched library fragments to attach to the cell flow surface. The TCR sequencing library was size-selected and purified using AMPure XP beads. The generated libraries were measured for their DNA concentration by qubit and assessed for their size by Tapestation or Bioanalyzer.

Sequencing and reads processing

TCR sequencing was performed on an Illumina Miseq sequencer using the 600-cycle Miseq reagent kit v3 (Illumina) with pair-end, 2 × 300 base pair reads. Library pool molarity was 4 nM. Transforming raw reads to alpha and beta CDR3 sequences was done by using MiXCR, a universal framework that processes big immunome data from raw sequences and output quantitated clonotypes (Bolotin et al.18). For the alpha chain we obtained 13·106 reads with an average of 171628 copies and 25159 unique amino-acid clones, and for the beta chain we obtained 18.6·106 with an average of 245867 copies and 49725 unique amino-acid clones. In order to avoid bias originating from different sample sizes, we sub-sampled 97048 sequences from each sample, as the number of clones in the smallest sample.

Randomization and blinding

The initial division of treatment (15 mice, labeled F to T) and control groups (5 mice, labeled A to E) was done randomly. At the end of the experiment, each mouse was assigned to a response group according to the changes to its tumor size in response to treatment: Early responders (mice L, N, P, Q, R) – mice that showed early response to treatment and reduction in tumor volume; Mid-responders – mice that showed medium response to treatment and reduction in tumor volume (mice J, M, O, S, T), and Late responders - mice that showed later response to treatment and reduction in tumor volume (mice F, G, H, I, K).


To generate the alpha and beta chains clonotyping lists, the tool MiXCR was used18. MiXCR is a universal framework that processes big immunome data from raw sequences to quantitated clonotypes. MiXCR was used with the following arguments: −−speciesmmu−−libraryde f ault −−threads12−−reportreportprnaseqOvParameters.geneFeatureToAlign = VTranscriptWithPOvParameters.parameters. f loatingLe f tBound = true−OjParameters.parameters. f loatingRightBound = f alseOcParameters.parameters. f loatingRightBound = true. Sample code for similar use is provided in the code availability section of this work.

Data Records

The TCR-Seq data were deposited in the NCBI Sequence Read Archive (SRA) under the accession number PRJNA65834822. This data appears as zipped FASTQ files, as is standard for next generation sequencing results. This data represents the TCR sequencing of all data points: blood samples taken on days 0, 7, 14, and 21 from treated and control mice, as well as TCR sequencing of tumors and spleens taken at the end of the experiment.

The Alpha and Beta chain TCRs were deposited on FigShare under the following (alpha) and (beta). These datasets appear as tabular text files, and include information on each clonotype in the data. This includes a unique clone ID, clone count, the clone’s fraction out of the total number of clones, the actual clonal sequence in base-pairs, and more. For complete details of the fields populating this data, we suggest reading the documentation at Note that this table is sorted according to the clone count by default, with the most prevalent clonotypes appearing on top.

The metadata for each clonotype dataset were deposited on FigShare under the following (alpha) and (beta). These data sets appear as tabular text files, and include meta-information on each data point. This includes the sample name (matching the RNA-Seq data files), the sample ID, Time point, a mouse identifier, the group (either treated or control), the tissue from which the sample was taken (Blood, Tumor or Spleen), and how quickly, if at all, a given mouse responded to treatment (as determined by tumor size measurements throughout the experiment)

The entire data collection process, as well as the data made available in this work, is presented as a work flow in Fig. 4.

Fig. 4
figure 4

Data processing flow chart. Pink rectangles indicate processes used to generate data, blue rectangles indicate data sets available in this work.

Technical Validation

TCR-seq validation

To validate the raw sequencing data, we used FASTQC27, A quality control tool for high throughput sequence data. Results showed that all sequences had high per sequence quality scores”. Additionally, all samples had valid percentage of base calls at each position for which an N was called, and all samples were of the same length of 301 basepairs, as expected. It should be noted that as these sequences are of T cell receptors only, and do not align a standard mouse genome, as is evident in the low alignment score of all samples, as well as the presence of promoters in the data.

Clonotyping validation

To demonstrate that the clonotype data is valid, we used Immunarch28 to measure the number and length of the quantified clonotypes for each chain. As can be seen in Fig. 5, there is no significant difference in number of clonotypes among between the treatment groups, nor is there a significant difference between the number of clonotypes between the response groups.

Fig. 5
figure 5

Alpha and Beta number of clonotypes by response and group. Panels on the left (A,C) indicate the distribution of the number of clonotypes belonging to the Alpha chain of the TCR, colored by response (top row) and treatment group (bottom row). Panels on the right (B,D) indicate the distribution of the number of clonotypes belonging to the Beta chain of the TCR, in a similar manner.

Similarly, as can be seen in Fig. 6, there is no significant difference in the length of the CDR3 section of the clonotypes between the treatment groups, nor is there a significant difference in the length of the CDR3 section of the clonotypes between the response groups.

Fig. 6
figure 6

Alpha and Beta CDR3 length distribution by response and group. Panels on the left (A,C) indicate the distribution of the length of the CDR3, in base-pairs, belonging to the Alpha chain of the TCR, colored by response (top row) and treatment group (bottom row). Panels on the right (B,D) indicate the distribution of the length of the CDR3, in base-pairs, belonging to the Beta chain of the TCR, in a similar manner.

The above emphasizes the validity of the data, and allows for unbiased comparison between different treatment or response groups.

Usage Notes

Reuse of the data provided herein is possible either by re-construction of the clonotypes from the raw sequencing files, or using our built clonotyping list and continue the analysis from there. Users should note that when constructing their own clonotyping data, different arguments can be used to obtain fields of interest in the data.

The raw repertoire sequencing data could be used in several ways; first, clonotype quantification can be performed in several ways and under different conditions29. We recommend the use of MiXCR (see code examples section for our application), but other tools such as IgBLAST30 and ImRep ( have useful features. Alternatively, the data can be used like any other short-read RNA sequencing data, although generated from T cells only; such a work-flow would include alignment, assembly and quantification of gene expression. While tools for each step exist, we recommend using tools able to perform all of these steps, as exemplified in a recent review by Stark et al.31.

Once clonotyping is performed, these data, both of the alpha and beta chains, could be used with many existing, specialized pipelines. These include Immunarch28, Immunoseq analyzer32, Immcantation Framework33 and VDJtools17. A recent useful review by Arunkumar et al. compares these tools and others19. Analysis such as this can be done to estimate the number of copies a specific clonotype has, compare clonotypes to those found in external databases, or attempt to tie the presence of clonotypes with response to treatment, or lack thereof.

Finally, the data presented here could be used with machine learning approaches to predict response to treatment. This could be done either with dedicated platforms such as ImmuneML34, or with any other tool or approach that could uncover a connection between response to treatment and the composition of the repertoire. Any such methods would benefit from the temporal nature of the data, and could support approaches dedicated to precise treatment of individual patients. We would recommend the use of time-series analysis to the data presented here.