Introduction

The recent development of high-throughput DNA sequencing technologies has driven rapid progress in genetics and genomics research, enabling us to address a multitude of biological questions. In the field of population and medical genetics, including cancer genomics, sequencing of the whole-genome or whole-exome as well as genome-wide association studies using single-nucleotide polymorphisms (SNPs) has allowed us to characterize disease-associated genetic loci and variants [1]. In the past decade, the International Cancer Genome Consortium (ICGC) as well as The Cancer Genome Atlas (TCGA) have characterized the genetic drivers of disease by performing comprehensive genomic characterization of various types of human cancer [2, 3].

Although germline and somatic mutations have been extensively analyzed, changes in the molecular makeup of immune cells associated with certain disease conditions have not been characterized in depth. The immune system plays a critical role in various biological and pathological conditions, such as infectious disease, autoimmune disease, drug-induced skin and liver toxicity, food allergy, and rejection after organ transplantation. In addition, recent developments in cancer immunotherapy have highlighted the importance of host immune cells in the fight against cancer. For example, through a process known as immune surveillance, the immune system eliminates nascent tumor cells from our body [4], and serves as a primary defense against cancer. The recent success of antibodies targeting immune checkpoint molecules such as cytotoxic T lymphocyte antigen 4 (CTLA-4), programmed cell death protein 1 (PD-1) and its ligand PD-L1, have clearly demonstrated that our immune system has the ability to eradicate cancer cells [5]. However, the molecular mechanisms by which those immune checkpoint-blocking antibodies kill tumor cells are still not fully characterized.

The adaptive immune system cells, the T and B lymphocytes, are selectively activated by recognition of an antigen through their trans-membrane receptors. These receptors are known as the T-cell receptors (TCRs) and B-cell receptors (BCRs), respectively, and are important for inducing various immunological reactions as described above. TCRs and BCRs are also known as the ‘signatures’ of T and B lymphocytes. Considering the extremely high complexity of immune responses in our body, a comprehensive approach to fully characterize changes in TCR and BCR repertoires is urgently needed.

During lymphocyte differentiation, genes encoding TCRs and BCRs, which include variable (V), diversity (D), and joining (J) exon segments, undergo a complex rearrangement process to generate functional receptors. Extremely diverse TCR and BCR repertories are created by combinatorial diversity generated by the presence of a large number of distinct V, D, and J gene segments, and also by junctional diversity created by the template-independent insertion and deletion of nucleotides at the V–D, D–J, or V–J rearranged junctions during V–D–J recombination. This rearrangement process generates the highly variable complementary determining region 3 (CDR3), which determines specificity and affinity for antigen recognition [6]. Theoretically, a repertoire of ~1018 different TCRs, heterodimers of alpha and beta-chains, can be generated in a human [7]. In addition to TCR diversity, an even more diverse BCR repertoire is expected due to further somatic hypermutation [8].

To characterize the complex structure of our immune system and investigate the underlying mechanisms related to various disease conditions, two strategies have previously been applied. The first strategy aimed to detect the presence of different TCR families by comparing the usage of different TCR variable (V) sequences [9]. A second strategy, called CDR3 size spectratyping, was used to determine the clonality of the repertoire using fluorescent primers to measure length variation of the CDR3 region within each TCR V family [10, 11]. Although these approaches can provide a certain level of information, such as the proportion of each V(D)J combination, they do not provide detailed information about the CDR3 region sequences. In addition, these methods are not applicable if unidentified exons for V and/or J segments are present in certain individuals or populations.

To overcome these technical limitations, a systematic, accurate and unbiased analysis of TCR and BCR transcripts is needed. Although the earlier generation of high-throughput DNA sequencers had limitations due to their short read-lengths (50–100 bp long) or low sequencing output, the development of longer read-lengths has enabled us to analyze millions of TCRs and BCRs in a single experiment [12,13,14,15].

In this review article, we describe the current status and applications of TCR and BCR repertoire deep sequencing, and discuss the future potential of immunogenomics/immunopharmacogenomics studies (Fig. 1).

Fig. 1
figure 1

Scientific areas covered by immunogenomics/immunopharmacogenomics. Immunogenomics (IG)/immunopharmacogenomics (IPG) approaches can be applied to better understand the pathogenesis of autoimmune diseases, immune rejection after organ transplantation, graft-versus-host disease (GVHD) and graft-versus-leukemia (GVL) effects after bone marrow transplantation (BMT) or hematopoietic stem cell transplantation (HSCT), food allergy, and immune responses following vaccine treatments and cancer treatments including immunotherapy. These approaches are also important for the establishment of methods to predict efficacy or adverse events after treatment by various immune-modulating agents

TCR and BCR sequencing with next-generation sequencers (NGS)

For TCR and BCR cDNA sequencing by NGS, we have applied a method known as 5′ rapid amplification of cDNA end (5′RACE) PCR [13, 16], in which one common forward primer is designed as the adapter sequence at the 5′ end, and another primer corresponding to the C region of the TCR, or of each BCR isotype, is used as a reverse primer (Fig. 2). This method allows efficient and less-biased PCR amplification of TCR and BCR cDNA.

Fig. 2
figure 2

Strategy of TCR and BCR sequencing. The PCR adapter is tailed to cDNA that is reversely transcribed from mRNA isolated from B or T cells, whole lymphocytes, white blood cells, or tissues of interest. For BCR sequencing, PCR is carried out using a forward primer for the PCR adapter (red) and a reverse primer for the constant region of each BCR isotype (green). Quantitative PCR (qPCR) is used to estimate the expression level of each BCR isotype. For TCR sequencing, PCR is carried out using a forward primer for the PCR adapter and a reverse primer for the TCR constant region. The PCR products are subject to BCR or TCR sequencing with next-generation sequencers

The combination of NGS technology with this cDNA library construction method enables us to obtain an unprecedented amount of information about TCRs and BCRs. For a single sample, we can generate at most 10 million sequence reads of the expressed receptor genes, which should provide a comprehensive characterization of the TCR or BCR repertoires. We have applied this approach to clinical samples to determine specific T-cell populations that have infiltrated into tumor tissues or malignant ascites [17, 18], or to identify specific T-cell populations that have expanded or decreased during the course of treatment [19,20,21,22,23,24,25,26,27]. Furthermore, we have utilized it to characterize the BCR repertoire to diagnose and monitor B-cell-related disease conditions such as food allergy as well as autoimmune and infectious diseases [28].

Identifying neoantigen-specific TCRs for cancer immunotherapy

Recently in the field of oncology, monoclonal antibodies that block immune checkpoint molecules such as CTLA-4, PD-1, and PD-L1, have shown significant clinical benefit, and have revolutionized cancer immunotherapy [29].

Tumor tissues often evolve multiple mechanisms to escape from immune-mediated destruction of tumor cells [30]. One of these mechanisms involves cell-surface expression of immune checkpoint molecules, such as CTLA-4, PD-1, and PD-L1 [31]. Durable responses have been frequently observed with antibodies against these molecules [32]. A higher number of somatic mutations has been shown to be correlated with better clinical responses to these anti-immune checkpoint antibodies [33,34,35]. It is suspected that higher numbers of somatic mutations generate higher numbers of good immunogenic neoantigens, which are somatic-mutation-derived cancer-specific antigens expressed on HLA molecules of cancer cells. These antigens are capable of activating T cells, which are considered to drive the clinical effects of immune checkpoint inhibitors. There has been significant interest in harnessing the neoantigen-specific immune response in clinical settings. Leisegang et al. [36] have reported the efficacy of TCR-engineered T cells targeting the cancer-specific p68 mutation (mp68) in eradicating very large solid tumors in mice. Adoptive transfer of patient-derived tumor-infiltrating lymphocytes (TILs) has also yielded positive clinical results in a small subset of patients with solid tumors [37, 38]. To facilitate the translation of these promising findings into clinical use, we have reported a time-efficient approach to identify neoantigen-specific TCRs using blood [39]. Notably, our protocol requires a total of two weeks, beginning from T-cell priming with candidate neoantigen peptides to identifying neoantigen-specific TCRs [39]. This rapid process is critical for the application of neoantigen-specific TCR-engineered T cells in clinical settings.

Characterizing T-cell changes during cancer immunotherapy

Given the complex and dynamic nature of the tumor immune microenvironment, it is critically important to analyze the molecular nature of immune responses in patients who are treated with cancer immunotherapy. In particular, the pre-existing balance between immune-active and immune-suppressive molecules in the tumor microenvironment mediates clinical response (Fig. 3). In the context of melanoma, several immunotherapies, including ipilimumab-targeting CTLA-4, pembrolizumab and nivolumab-targeting PD-1, have been approved by the U.S. Food and Drug Administration. However, only a fraction of patients respond to these immunotherapies. Previous studies have shown that PD-L1 expression as well as the density and location of T cells in metastatic melanomas were predictive of response to PD-1 blockade [40]. Our analysis of the tumor TCRβ repertoire of melanoma patients undergoing nivolumab treatment revealed that oligoclonal expansion of TILs as well as increased expression of PD-L1, granzyme A (GZMA), and HLA-A were associated with treatment response [22].

Fig. 3
figure 3

Characterization of the tumor immune microenvironment and immune responses induced by cancer immunotherapy. The pre-existing balance between immune-active and immune-suppressive cells/molecules in the tumor microenvironment affects clinical responses to immune checkpoint blockades in cancer immunotherapy. In the state in which cancer progresses, the immune-suppressive side is more dominant compared to the immune-active side in the tumor microenvironment. After treatment with immune checkpoint blockades, if the immune-suppressive side is still dominant compared to the immune-active side, a poor clinical response is expected. However, if the immune-active side becomes more dominant compared to the immune-suppressive side, a favorable clinical response is expected

Since it is technically possible to obtain comprehensive information about T lymphocytes in cancer tissues, blood, and cancer-related effusions by sequencing of TCR, these approaches should enable us to further characterize the detailed mechanisms of immune responses in these treatments.

Characterizing T-cell changes during non-immune-targeted cancer therapy

Using NGS to evaluate the immune components in cancer patients who are treated with non-immune-targeted therapies has yielded extremely useful information for disease monitoring and clinical outcome prediction. The relationship between TILs and patient prognosis was first reported in a study of epithelial ovarian cancer [41]. Patients exhibiting an increased fraction of CD3+ TILs had better survival, indicating that the immune response may have an important role in determining clinical outcomes. Subsequent characterization of the infiltrated lymphocytic fraction in these cases revealed the importance of CD8+ T-cell populations in determining better outcomes [42]. It was hypothesized that the lack of effector CD8+ T cells in the malignant pleural effusion resulted from defects in CD8+ T-cell recruitment due to the immunosuppressive effects of the disease [43], allowing the malignant cancer cells to proliferate. The use of TCR sequencing may help to further differentiate the T-cell populations that are present in a particular environment. As compared to conventional flow cytometry techniques or immunological assays, comprehensive characterization of the T-cell repertoire by NGS provides information about specific subpopulations that are present in tumor tissues or ascitic/pleural effusions, enabling us to decipher which populations have key roles in treatment response. In our study of malignant effusions from ovarian cancer patients, we observed enriched T-cell clones that were not common to those found in tumors [17]. Furthermore, the abundant TCR clonotypes in CD4+, CD8+, and CD4+CD25+ T-cell populations in these samples were mutually exclusive, indicating that the immune microenvironments in tumors and ascites are entirely distinct [17].

The presence of TILs is associated with favorable clinical outcomes in several tumor types. In the context of bladder cancer, a small cohort study has documented an increase in both infiltrative B cell and T cells into the bladder tissues after BCG intravesical treatment, and those cases revealed a lower incidence of disease recurrence [44]. In our study investigating immune regulation in muscle-invasive bladder cancer, we found that oligoclonal expansion of TILs was significantly associated with longer recurrence-free survival (RFS) of patients who underwent definitive surgery [20]. Higher neoantigen load was also associated with longer RFS. These molecular patterns may thus provide useful prognostic markers and serve as a tool for prediction of disease recurrence following each respective treatment.

Regulatory T cells (Tregs) in tumors were shown to be associated with poor clinical outcomes in various cancer types [45]. However, one study found that high numbers of Tregs in follicular lymphoma (FL) tissues were associated with better clinical outcomes [46]. A subsequent report concluded that a specific intrafollicular Treg pattern, rather than number of Tregs, was correlated with poor survival [47]. In addition, it remains unclear whether Tregs suppress the immune system in an antigen-specific manner. Interestingly, we found that Tregs characterized from pretreatment FL biopsy specimens were highly clonal [18]. In line with previous research, perifollicular CD8+ T cells in tumors showed stronger clonal expansion compared to the intrafollicular CD8+ T cells, suggesting that antigen-specific CD8+ T cells capable of recognizing FL cells may be excluded from malignant follicles by some undetermined mechanism [18].

While nivolumab and pembrolizumab have been approved for the second-line treatment of recurrent or metastatic squamous cell carcinoma of the head and neck (SCCHN), SCCHN in the locoregionally advanced stage is currently treated with chemoradiation therapy and surgery. We investigated tumor tissues from patients with locoregionally advanced SCCHN prior to chemoradiotherapy to examine the possible roles of the host immune system in SCCHN [26]. Interestingly, clonal expansion of T cells was significantly stronger in human papilloma virus (HPV)-negative tumors compared with HPV-positive tumors. HLA-A expression levels were also significantly higher in HPV-negative tumors. Additionally, higher GMZB levels in tumor tissues were significantly correlated with longer RFS independent of other clinicopathologic parameters. These findings imply differences in immune microenvironment between HPV-negative and HPV-positive tumors. Additionally, pretreatment levels of immune markers, such as GZMB, might serve as one of the predictors of recurrence risk for patients with locoregionally advanced disease.

Cryoablation is used for the treatment of renal-cell carcinoma (RCC) as well as other cancer types. This type of treatment is expected to attract immune cells through inflammatory signals. Studies of cryoablation in RCC and prostate cancer have indeed demonstrated its ability to induce a tumor-specific cytotoxic T-cell response [48, 49]. Moreover, a study in a mouse melanoma model showed that cryoablation resulted in an increase of infiltration of macrophages and dendritic cells (DCs) in the tumor microenvironment [50]. We analyzed the TCRβ repertoire in tumor tissues and blood samples from kidney cancer patients before and 3 months after cryoablation to better characterize the tumor and peripheral immune response during the treatment. Clonal expansion of certain T-cell clones was observed in the tissues after the treatment, and some of these clonotypes also expanded in peripheral blood samples. This would suggest that cryoablation can induce not only local but also systemic T-cell responses. In line with previous findings, we observed an increase in CD11c+ cells (macrophages and DCs) in the post-cryoablation tissues. Interestingly, CD8+ expression levels (CD8+ T-cell infiltration) were significantly associated with antigen presentation, which was measured by HLA-A expression. Altogether, these molecular changes have enabled us to better characterize the immune-mediated response to cryoablation (Fig. 4). In the clinical context, expanded TCR sequences that emerge after cryoablation might be applied for cancer-antigen-specific TCR-engineered T-cell therapy.

Fig. 4
figure 4

Possible mechanism of cryoablation-induced immune-mediated elimination of cancer cells in distant lesions. Cryoablation damages tumor cells while proteins are likely to be kept intact. Damaged cells release inflammatory signals like cytokines and chemokines. This stimulates infiltration of antigen-presenting cells (APCs) and lymphocytes into the tumor microenvironment. Dead cancer cells are phagocytosed by APCs, and proteins in cancer cells are processed and presented on HLA molecules of APCs as tumor-antigen-specific peptides. These antigens are recognized by tumor-antigen-specific CD8+ lymphocytes, resulting in activation and proliferation of tumor-antigen-specific T cells. Expanded T cells may go into circulation in blood, and eliminate cancer cells in distant tumor lesions

Applications of TCR/BCR sequencing in other diseases

Pathogenesis of autoimmune diseases

Tissue infiltrating lymphocytes in an affected lesion have an important role in the pathogenesis of autoimmune diseases. For example, CD20+ B cells and CD8+ T cells have been identified as the major infiltrating lymphocytes in the thyroid of patients with autoimmune thyroid disease (AITD) [51]. Infiltration of CD4+ and CD8+ T cells is related to progression of renal dysfunction and enrichment of CD3+CD4−CD8− (double negative) T cells in salivary gland is suggested to contribute to the damage of the tissues in patients with systemic lupus erythematosus [52,53,54]. Furthermore, persistent activation of certain clonal T and B cells has been observed in various autoimmune disease conditions. Hence, characterizing the human immune cells including Type 1T helper cells (Th1), Type 2T helper cells (Th2), and B cells in blood and/or affected lesions of patients with autoimmune diseases should provide valuable information to aid in understanding the molecular pathogenesis of autoimmune diseases. In this regard, quantification of unique T-cell and B-cell subpopulations through the use of high-throughput sequencing of TCRs and BCRs in a temporal and spatial manner should contribute to a better understanding of these diseases. In addition, this type of knowledge can be applied to improvements in diagnosis, prognosis and selection of patients for appropriate therapies [55].

For example, inflammatory bowel disease (IBD), such as Crohn’s disease (CD) and ulcerative colitis, is a chronic inflammatory condition of intestines likely caused by dysfunction of innate and adaptive immune responses that are likely related to commensal microbiota [56]. In these disease conditions, oligoclonally expanded CD4+ T cells were persistently present in preoperative and postoperative disease lesions [56]. Our study of inflamed tissue specimens from CD patients demonstrated expansion of oligoclonal T cells whose TCR sequences were not observed in corresponding normal tissues [57], suggesting that certain disease-causative T cells could be culprits for intestinal inflammation in CD. Further analysis of sorted T-cell subpopulations will provide a better understanding of immune dysfunctions that contribute to CD/IBD pathogenesis.

Pathogenesis of food allergy

In the United States, ~15 million people suffer from food allergies, including 1 in 13 children. Interestingly, there are unique geographic patterns of possible antigens causing food allergy [58]. For instance, peanut allergies kill 100–150 people in the US each year [59], but are uncommon in Asian countries. For example, in Japan, the estimated number of deaths in children from peanut allergies was 14 in the seven years between 1995 and 2001 [60]. On the other hand, allergy to shellfish is highly prevalent in Asian countries such as Japan, Philippines and Thailand [61]; population surveys show the prevalence rate in teenagers in Singapore to be 5.23% [58]. Allergy can be characterized into two major phases; the sensitization phase and the effector phase. Dendritic cells with food-specific antigens on their HLA molecules stimulate CD4+ Th2 cells that have TCRs recognizing these antigens. These activated CD4+ cells promote B cells to produce allergen-specific IgE in the sensitization stage [62]. These T cells also help maintain the allergen-specific IgE level in the late phase. A number of studies have indicated various potential roles of Th1, Th9, Th17, and Th22 effector T cells in food allergy [63,64,65]. Hence, the quantitative analysis of TCR and BCR repertoires in patients with food allergy will help to determine the inter-individual differences and intra-individual time-course changes. We conducted a preliminary analysis of the BCR repertoire in patients with peanut allergy prior to and post oral immunotherapy (OIT) to determine changes in the BCR repertoire during the course of treatment [28]. Through OIT, certain immunoglobulin heavy chain alpha (IGHA) and IGH gamma (IGHG) clones were oligoclonally enriched, and overall diversity of the BCR repertoire was significantly reduced. The identification of specific BCR sequences that are involved in the development of peanut allergy is a critical next step, as this information can help shed light on the detailed mechanism of the molecular pathology of food allergy.

Pathogenesis of GVHD after HCT

Hematopoietic cell transplantation (HCT) is now one of the essential treatment options for patients with hematologic diseases. The obstacles in this treatment are graft rejection or graft-versus-host disease (GVHD), which are an impediment to the success of HCT. T cells are known to play a key role in both graft rejection and GVHD [66, 67].

The immune response of host cells against the donor cells is initiated by recognition of the antigen(s) presented on HLA molecules of the donor cells. This is referred to as the sensitization stage. The host T cells recognize the donor cells and start to proliferate. The cells then destroy and eliminate the donor cells from the host, which is termed the effector stage [68]. There are three general forms of graft rejection. They are hyperacute, acute, or chronic [68]. Hyperacute rejection normally occurs within minutes to hours after transplantation due to pre-existing antibodies and effector T cells against the graft cells. Acute rejection usually occurs within six months following transplantation, while chronic rejection manifests months to years after transplantation. Several studies have shown that recipient-derived cytotoxic T cells might cause acute graft rejection through recognition of HLA-A, HLA-B, or HLA-C antigens of donor cells [69,70,71]. The disparity of these HLA alleles increases the risk of acute graft rejection.

GVHD occurs when the donor T cells attack the host tissue and the host cells are unable to mount an immunological response against the graft cells [72]. GVHD can be classified into two phases, acute or chronic. These two phases differ in the symptoms, time of onset and target tissues. Acute GVHD usually occurs within 3 months after the transplantation and often causes severe damage to the skin, liver, and gastrointestinal tract. Chronic GVHD is observed after 100 days of the transplantation and displays a more diverse set of clinical manifestations similar to the systemic autoimmune syndromes [72, 73]. The development of GVHD and its severity is likely to depend on various factors including age and source of the graft. Generally, the incidence of GVHD is higher in patients whose donor is unrelated or whose HLA types are not well matched. Older patients are likely to have a higher risk of developing GVHD compared to younger patients. Having a donor with a different sex also increases risk. The pathophysiology of GVHD is very complex. Prior to transplantation, patients receive a conditioning regimen which consists of chemotherapy and/or radiotherapy and is used to suppress the host immune function to prevent graft rejection. However, this treatment may in fact cause damage to tissues and induce inflammation. Tissue damage and proinflammatory cytokines will then trigger the activation of donor-derived T cells, which will mediate cytotoxicity against the target host cells [73]. Several studies have shown that the activated donor-derived cytotoxic T cells play an important role in the development of acute GVHD [74, 75]. Moreover, a lower level of regulatory T cells [76] is suggested to be associated with acute and chronic GVHD [77].

We have performed a comprehensive high-throughput analysis of the TCR repertoire in acute GVHD patients as well as in non-GVHD patients using the next-generation sequencing platform. We found that certain T-cell clones were activated in the course of GVHD development. Moreover, stronger oligoclonal enrichment was observed in GVHD patients around the time when they were diagnosed with GVHD, compared to the controls [27] (Fig. 5), suggesting that clonal expansion immediately following HSCT transplantation might serve as a biomarker of risk for GVHD. Characterization of the TCR repertoire by deep sequencing is therefore informative and may contribute to a better understanding of the molecular mechanisms of GVHD pathophysiology.

Fig. 5
figure 5

Immune characterization of graft-versus-host disease (GVHD). GVHD occurs when donor T cells recognize host-cell-specific antigens. When host-antigen-specific T cells recognize antigens in certain tissues, T cells are activated and proliferate in host tissues. Such expanded T cells can be detected in blood samples after hematopoietic stem cell transplantation (HSCT) or bone marrow transplantation (BMT) before acute GVHD symptoms become overt

Challenges and future directions

Sample quality and sorting of different populations of immune cells

Sequencing data from sorted immune cells can enhance our understanding of the role of specific subsets of cells. However, obtaining a sufficient amount of tissue to quantify the functional subtypes of T cells is a challenge in certain disease conditions. In this regard, since the TCR repertoire of each subset of T cells is likely to be distinct, we may analyze the TCR information of each subset of T cells in blood and determine which subset of T cells has infiltrated into the cancer tissue. Indeed, in our analysis of serial tumor and peripheral blood mononuclear cell (PBMC) samples from non-small cell lung cancer (NSCLC) patients receiving anti-PD-1 therapy, we found in one case that dominantly expanded T-cell clones in the peripheral blood were concordant with those found within the tumor [78]. Thus, TCR sequencing of sorted blood samples may provide information about TIL subpopulations when the two repertoires are comparatively analyzed.

One major advantage of our cDNA sequencing approach is that we do not have to isolate T cells from the cancer tissue because TCRs are expressed only in T cells. This makes the characterization of TCR repertoires of TILs much easier. However, to obtain a more complete picture of the immune response mechanism in cancer, we will undoubtedly need a holistic approach that integrates TCR repertoire analysis with somatic-mutation analysis, determination of HLA types, and analysis of proteasome processing systems in the antigen presentation machinery. This poses a big technical challenge but will be crucial to accurately assess the cellular mechanisms that determine effective immune function in the cancer microenvironment.

Data analysis

Owing to the complexity and diversity of the TCR repertoire, TCR NGS data analysis represents a very challenging bioinformatics task. There are several publicly available software packages for TCR NGS data analysis, including IMGT/HighV-QUEST [79], MiXCR [80], Decombinator [81], and IgBlast [82]. IMGT/HighV-QUEST, developed by the international ImMunoGeneTics information system (IMGT), is a widely used online system for the standardized analysis of collections of rearranged nucleotide sequences of TCR and BCR. However, the input data is limited to less than 500,000 reads per run. Decombinator [81] was developed based on the classic approach of Aho and Corasick of pattern matching and it includes a novel modification to correct for sequencing error. MiXCR also considers sequencing quality. This package further performs correction of PCR and sequencing errors, as well as rescues low-quality sequencing data. In addition, MiXCR’s VDJTools package is available for data presentation.

Our group has developed a novel algorithm of V(D)J decomposition, Tcrip [83], which we have used in combination with a ‘remapping’ step for unmapped read analysis to effectively analyze the cDNA sequence of both the TCR α and β repertoires [83]. Compared to the widely used online software IMGT/HighV-QUEST [79], our algorithm has a much greater input capacity (up to millions per job), which is extremely critical for analyzing high-volume NGS data. Moreover, compared to MiXCR, our algorithm can provide additional information such as unmapped sequence reads, which has enabled us to discover molecular changes of biological and genetic importance. For example, we have observed a relatively large proportion of TCRβ reads containing intronic sequences in NSCLC patient samples who were treated with multiple chemotherapy regimens but not in the healthy donor samples [83]. In these intron-containing cDNA sequences, introns between the J–C segments are correctly spliced out. Hence, the contamination of genomic DNA is very unlikely. Given that those patients have received intensive chemotherapy (2–4 different regimens), we hypothesize that their T lymphocytes have been severely damaged by chemotherapy, which has caused impairment in their splicing machinery. For the analysis of the BCR repertoire sequencing data, we have developed an analogous algorithm, Bcrip [28], which provides highly concordant clonotype assignments (V–J–C and CDR3) as well as the potential to detect novel exons.

Pairing of TCRαβ sequencing—functional analysis for reconstruction of TCR

Although TCRβ sequencing alone is sufficient to determine TCR clonality, it is important to gather information on both the alpha and beta transcripts of the TCR since antigen specificity is determined by the conformational pairing of the two molecules [84]. Previous efforts to elucidate pairing information have involved single-cell sorting and Sanger sequencing of the isolated T-cell clone [85,86,87]. However, single-cell methods are limited in throughput and by expensive costs, and the field will inevitably require methods to examine millions of T-cell or B-cell repertoires to monitor the dynamic changes of our immune responses. In recent reports, investigators have scaled up the throughput of sequencing pair transcripts in B cells [88], however, this method still falls short on the total number of reads derived from each sample compared to our new method. Currently, we can predict the combination of α- and β-chains that constitute the TCR if the proportion of T cells of particular interest is highly expanded. However, when α- and β-chains are present at lower frequencies, it is almost impossible to predict the combination of the two chains.

In the future, it will be essential to develop more efficient methods for elucidating paired transcript information to decipher antigen specificity. A number of studies have proposed methods to address TCR pairing without the use of single-cell technologies [89, 90]. One such method relies on a pairing algorithm: T cells from the same individual are split into a subset of wells and amplified with barcode sequences to identify which well each α- or β-sequence came from [90]. Putative pairs are then identified based on the fact that the true TCRα and β pairs should occupy the same wells. By comparing the well occupancy patterns of each α- and β-sequence, those sharing the same barcode more frequently than expected by chance are determined to be pairs. This method has been used to identify 200,000 TCRα and β pairs from PBMC samples seeded at 160,000 cells per well, with a false discovery rate of 1 percent.

Summary

The use of next-generation sequencing for the genetic characterization of the immune system, known as immunogenomics/immunopharmacogenomics, will be important for a deeper understanding of the pathogenesis of various disease conditions. Abnormal immune responses in our body lead to the development of autoimmune diseases and food allergy. Rejection of recipient cells or donor cells is also caused by uncontrolled immune responses in the recipient. There have been many reports indicating that activated immune responses through the drug–HLA interaction are present in drug-induced skin hypersensitivity and liver toxicity. The importance of host immune responses has been recognized in cancer treatments, not only for immunotherapy but also for cytotoxic agents and molecular targeted drugs. Thus, characterization of the TCR and BCR repertoire by means of NGS will ultimately enable us to identify the molecular mechanisms underlying various diseases. In addition, this approach may contribute to the identification of antigens that are associated with disease onset or progression. Computational analyses to draw meaningful inferences of functional recognition receptors on immune cells, however, remain a major challenge. In this review, we have tried to summarize the importance of TCR and BCR deep sequencing, and propose immunogenomics/immunopharmacogenomics as the next frontier in scientific discovery, as it continues to uncover the complex nature of our immune system, which has critical roles in the pathogenesis of various diseases.