Profiling the TRB and IGH repertoire of patients with H5N6 Avian Influenza Virus Infection by high-throughput sequencing

Avian Influenza A (H5N6) Virus causes severe influenza disease in humans and is manifested by acute respiratory distress syndrome, multi-organ failure, and high mortality rates. T cells recognize antigens specifically through a membrane protein T cell receptor (TCR). To ward off a wide variety of pathogens, the human adaptive immune system harbors a vast array of TCRs, which are collectively referred to as the TCR repertoire. The B cell receptor (BCR) is involved in inducing the humoral immune response. The generation of a diverse T cell and B cell repertoire is essential for protection against infection. In this study, multiplex PCR based on genomic DNA amplicons and Illumina high-throughput sequencing (HTS) were applied to study the characteristics and polymorphisms of the TRB and IGH repertoire in the peripheral blood mononuclear cells (PBMCs) from two H5N6 AIV patients and six healthy donors (NC). The CDR3 average length in the AIV group was different from the NC group. The TRBV12-3, TRBV12-4, and TRBV15 gene segments and TRBV30/TRBJ1-2, TRBV12-3/TRBJ1-1 and IGHV3-11/IGHJ6 gene segment pairings also exhibited a higher usage in the PBMCs of AIV donors and may provide more information for generating more effective T/B cell targeted diagnosis/protection strategies.

most adults have circulating influenza-specific memory CD8 + T cells 8 , which provide protection against pursuant virus exposure 9 . After the Influenza virus is cleared, antigen specific memory CD8 + T cells are long-lived and are capable of rapid recall to effector function. Evidence 10,11 suggests that CD8 + T-cells mediate cross-reactive protection in the face of heterologous prime/challenge among H1N1, H2N2, H3N2 and H5N1 viruses. However, neutralizing antibodies generated against HA can protect from infection with a specific strain, and these antibody responses are generally not cross-reactive with other influenza strains 12 . The adaptive immune system protects each individual against insults, such as infection, and responds to vaccination with B cell proliferation in response to the antigenic stimulation, which is composed of B and T cells and produces a large number of antibodies. The foundation of the adaptive immune response is based on the enormous diversity of T and B cells, which produce a large number of antibodies. This profound diversity of T (TCRs) and B cell receptors (BCRs) is generated by V-D-J gene recombination of the TCR/BCR locus and the subsequent somatic hypermutation and class-switching recombination of B cells after antigen stimulation. Understanding which B cells produce specific antibodies and their kinetics conferring immunity against the variety of antigens has application in the field of vaccine development and in assessing response to infection 13 . The diversity of the human B cell repertoire is estimated, and the theoretical potential includes up to 10 11 unique variants 14 ; such diversity makes it difficult to analyze the repertoire.
Next-generation sequencing technologies have opened a new era in the field of T/BCR repertoire research, which includes studies on immune repertoire reconstitution after therapy, response to vaccines and subpopulation repertoire structure 15 . Thus, the study of the immune repertoire, portrayed as the antigen-specific information within lymphocytes, is key to understanding the response of adaptive immunity during infection.
The present study showed the TCR and BCR sequence characteristics from the peripheral blood of two patients with H5N6 Avian Influenza Virus Infection by HTS. From the HTS findings, we searched potential reactive clones and offer important new information for the disease.

Results
Case report. The first case is a 40-year-old female from the Duanzhou District, Zhaoqing City, with an onset date of 22 December. The patient was admitted to the hospital on 28 December and is now in critical condition. The second case is a 25-year-old male from Shenzhen City, Guangdong Province, who developed symptoms on 1 January. The patient was admitted to the hospital on 4 January and is now cured and left the hospital. He has a history of visiting a live poultry market. He worked in his father's restaurant and had no other medical problems in his past history.
The two patient's conditions deteriorated rapidly, and one-sided pneumonia progressed to two-sided pneumonia. When patient 1 arrived at the designated hospital, her condition was critical. She was given invasive mechanical ventilation and was transferred to the intensive care unit immediately. Other treatments, including gastric protective treatments, methylprednisolone, intravenous immunoglobulin and fluid infusion, were given. The complications and treatments are shown in Table 1. Despite active treatment, her condition continued to deteriorate. Fifty days after admission, she died from multi-organ failure. The male patient arrived at the designated hospital with 80% oxygen saturation, and then, he received noninvasive ventilator-assisted breathing. He was also given treatments, including gastric protective treatments, methylprednisolone, intravenous immunoglobulin and fluid infusion. His condition improved gradually, and 14 days after the illness onset, he was discharged.

Summary of sequencing.
To investigate the characteristics of the T/B cell immunity of the patient, the TCR β chains and BCR IGH from the PBMCs were sequenced using a HTS platform (IlluminaMiseq). The CDR3 sequences are identified by the conserved motif, and the abundance of each CDR3 clone and the number of distinct CDR3 clone species were calculated. The raw data are shown in Table 2. The profiles of the CDR3 lengths are an important determinant of the T/B cell repertoire diversity and were consistent with a polyclonal representation of clonotypic TCR and BCR sequences. The TCR CDR3 length varied from 6 to 19 nt, with a peak at 11 or 12 nt. The BCR CDR3 length varied from 4 to 29 nt, with a peak at 11, 16 or 17 nt. All the groups displayed good Gaussian distribution of the CDR3 length. Then, we fitted the Gaussian distribution curve for each sample with Excel, and the goodness of fit was quantified by R 2 , which ranged from 0 to 1 (that is, from worst fitness to best fitness).
TRBV and TRBJ usage. The IMGT TRB gene database collected 48 functional TRBV genes and 13 functional TRBJ genes 16,17 . Here, 57 distinct TRBV genes and 14 distinct TRBJ genes were detected. The usage ratio of this set of 57 TRBV genes ranged from 0.00% of TRBV6-8 to 11.1% of TRBV2 in patient 1, while the usage ratio ranged from 0.01% of TRBV3-2 to 11.52% of TRBV30 in patient 2 (Fig. 1). The gene expression level of TRBV2 was the highest in both the patients and healthy controls. TRBV30 was highly expressed in the two patients (10.15%, 11.52%) but was low in the controls (3.62%). TRBV7-2 showed the highest expression in patient 1, while TRBV12-3, TRBV12-4 and TRBV15 showed the opposite. The data suggested that these genes might be associated with the severity of the disease, which may provide protection. The usage ratio of this set of 14 TRBJ genes ranged from 0.45% of TRBJ2-4 to 18.54% of TRBJ1-1 in patient 2 and showed a similar ratio in patient 2 and the controls (Fig. 2).
In the present study, 598, 598 and 626 TRBV-TRBJ pairings were identified in the two patients and controls. The significant pairs are listed ( Table 3). The most significant pairing from patient 1 was TRBV30/TRBJ1-2, accounting for 2.61% of the total pairings, while the most frequent pairing from patient 2 was TRBV12-3/TRBJ1-1, accounting for 8.04% of the total pairings. TRBV CDR3 sequence diversity. The extent of the interindividual TCR sharing was determined by the TCR clonotype frequencies. Our analysis focused mainly on the DNA and aa sequences of complementarity-determining www.nature.com/scientificreports www.nature.com/scientificreports/ www.nature.com/scientificreports www.nature.com/scientificreports/ region 3 (CDR3), which is the most diverse region of the TCR molecule and is associated with antigen epitope recognition. In this study, we defined that clones with a frequency above 0.1% of total reads in a sample were Highly expanded clones (HECs). There were 9 TCR highly expressed clones in the patients. A summary of these HEC sequences is presented in Table 4. Next, we investigated whether these HECs overlapped between the different patients, but we found that patient 2 shared some HEC sequences with the controls. Patient 1 was critical, which means that the HEC may be associated with the severity of AIV.
In the present study, there were 301, 308 and 345 IGHV-IGHJ pairings identified in the two patients and the controls. The significant pairs are listed ( Table 5). The most significant paring for patient 2 was IGHV3-11/IGHJ6, accounting for 20.09938% of the total pairings.
Highly expanded clones (HEC) and BCR repertoire diversity. The expression level of each clone was calculated according to the identity of each sequence after the alignment. The degree of expansion of each sample clone was based on the unique CDR3 sequence frequency. Here, we defined BCR clones with a frequency above 0.1% of the total reads in a sample as HECs. In the patient samples, we observed 36 clones in patient 1 and 33 in patient 2, while in the healthy control group, no HECs were defined. A comparison of the degree of expansion of the most expanded clones in each group showed that the HEC number and the HEC ratio in the AIV group were more expanded than those in the NC group. The shared B cells or public B cells are of long-term interest both in healthy and disease states. Next, we investigated whether the HECs overlapped between the different individuals Items  www.nature.com/scientificreports www.nature.com/scientificreports/ of the same group or between the AIV and NC groups. We found that each individual had a unique repertoire, and no HEC was shared by different individuals. Our results suggest that the entire BCR repertoire of the peripheral blood has a much more skewed clonotype composition in AIV patients than in the healthy controls (Table 6).

Discussion
Humans H5N6 avian influenza viruses were first confirmed in China in May 2014. The patients all had direct contact with poultry prior to the onset of the disease 1,18 . However, these patients did not contact poultry, and they were infected with H5N6 influenza. Hence, indirect contact with poultry, such as passing a live poultry market, needs further attention. The two cases were adults aged 40 and 25 years. The clinical manifestations of these patients were unresolved fever, cough, expectoration of sputum, sometimes accompanied by hemoptysis, and shortness of breath. They occurred with a rapid progression of acute respiratory distress syndrome. These were consistent with the previous report of H5N1 infection 19 .
T lymphocytes play an important role in mediating anti-Influenza virus immune responses. Influenza virus can be cleared efficiently by the adaptive immune system, consisting of antibody-producing B cells and influenza-specific T lymphocytes with diverse functions. For mild infections, whether caused by seasonal influenza viruses or occasional asymptomatic AIVs, the memory CD8 + and/or CD4 + T cells provide a great level of protection 20,21 . The identification of human T cell epitopes helps in efforts to determine how concurrent simultaneous infection with H5N6 viruses affects CD8 T cell immunity. The treatment strategies will have to be modified to recruit low-affinity and cross-reactive T cells.
In the past, the study of the T/BCR repertoire was based on traditional technologies, such as polymerase chain reaction (PCR)-single strand conformation polymorphism analysis (SSCP), flow cytometry and immune spectratyping. The T/BCR repertoire is hard to reflect and is limited by the number of sequence data. A deep-sequencing   www.nature.com/scientificreports www.nature.com/scientificreports/ approach to assess the T/BCR repertoire is possible with the advances in HTS, and we now have powerful tools to reveal information on the TCR and BCR repertoire after infectious states. Conversely, H5N6 avian influenza viruses-specific TCR/BCR clonalities will be valuable markers to assess the nature and quality of a vaccine response 22 .
In the present study, we used HTS of the human TCR/BCR heavy chain repertoire to gain insight into T/B-cell dynamics after H5N6 infection, and we obtained H5N6 infection-induced changes in the total repertoire. Direct sequencing of the TRBV CDR3 regions from the T-cell gDNA library using HTS fully captured all the CDR3 length diversity. Then, virtual TRBV spectra types, according to the CDR3 lengths of the 46 TRBV gene segments, were examined. In the Patients, V families (TRBV7-2, TRBV10-3, TRBV7-3, TRBV7-8, TRBV10-1, TRBV24-1, TRBV24/OR9-2) were demonstrated at the top. However, more V families (TRBV11-1, TRBV5-8, TRBV28,

Clonotype
Public sequence    www.nature.com/scientificreports www.nature.com/scientificreports/ TRBV13, TRBV5-4, TRBV6-7, TRBV7-4, TRBV6-9, TRBV16, TRBV3-2) showed low expansion. However, TRBV4-2 was more highly expressed in patient 2 compared to the others. To better understand the shared nature of the T cells in disease presentation associated with H5N6 AIV infections, we analyzed the CDR3 of the clonally expanded T cells and found differentially expressed ASNSGGELF, ASSLQGRTEAF, ASSNGRPNEKLF, ASSPPRLADTGELF, ASSPR and HGQPNTEAF amino acid sequences in the CDR3 of the TRBV chains. These findings may facilitate the allogeneic adoptive T cell immunotherapy, and more dominant CDR3 sequences in the distinct length forming the clonal expansion would be easily identified by the high-through sequencing technology. Thus, the sequence information of the CDR3 may be a valuable resource for disease diagnosis and drug design.
In addition, analyses of the composition of the TCR and H-CDR3 showed that the overall AA compositions of the CDR3 in each AIV patient were not significantly different. These results revealed that small numbers resulted in a lack of a validated system to describe patient outcome. While this is a descriptive and preliminary work in a very small group of patients, we used a novel NGS protocol to investigate the TRB and IGH repertoire of patients with H5N6 Avian Influenza Virus Infection at the sequence level for the first time. In further research, more samples will be included.
In conclusion, we demonstrated a successful approach for profiling the entire T/B cell repertoire at sequence-level resolution in AIV patients. By applying the method described herein, we identified some T/BCR   www.nature.com/scientificreports www.nature.com/scientificreports/ repertoire features. These global repertories contain a wealth of information and provide us many opportunities to gain new insights into humoral immunity and offer practical results for the benefit of patients.

Patients and controls. Two cases of human infection with avian influenza A (H5N6) virus were laborato-
ry-confirmed by the National Health and Family Planning Commission (NHFPC) of China. The first case is a 40-year-old female from the Duanzhou District, Zhaoqing City, with an onset date of 22 December. The second case is a 25-year-old male from Shenzhen City, Guangdong Province, who developed symptoms on 1 January. The 6 healthy controls were adults without any basic diseases, and they were three males (25-30 years old) and three females (35-40 years old). DNA from 6 control' B cells and T cells were mixed together by 1:1:1:1:1:1 according to Qubit value, renamed one Heath control pool.
Ethics statement. Written informed consent for study participation was obtained from all the subjects.
The use of PBMCs for further studies beyond routine diagnosis was approved by the Third People's Hospital of Shenzhen and the Zhaoqing First People's Hospital ethics committee. This study abides by the Helsinki Declaration on ethical principles for medical research involving human subjects.

Mononuclear cell preparation, T cell and B cell isolation and DNA extraction. Peripheral blood
mononuclear cells (PBMCs) were prepared using fresh whole blood from 2 patients and 6 healthy individuals. The blood was collected in Monovette EDTA tubes. Cell isolations and DNA extractions were performed under Biosafety Level 3 containment. The PBMCs were isolated by density gradient centrifugation using Lymphoprep (Axis Shield, UK) according to the manufacturer's instructions. B lymphocytes were isolated with MicroBeads (Miltenyi Biotech, Germany) according to the manufacturer's instructions. Then, the B cells were enriched by magnetic cell sorting (MACS) according to the manufacturer's instructions (Miltenyi Biotech). Non-B cells were indirectly magnetically labeled with a cocktail of biotin-conjugated monoclonal antibodies as the primary labeling reagent and anti-biotin monoclonal antibodies conjugated to MicroBeads as the secondary labeling reagent.
The magnetically labeled non-B cells were depleted by retaining them on a MACS ® Column in the magnetic field of a MACS Separator, while the unlabeled B cells passed through the column. The T cells were isolated with anti-human CD3 magnetic beads according to the manufacturer's protocol (Miltenyi Biotec, Bergisch, Gladbach, Germany). The T-cell purity was >90% (data not shown), as determined by flow cytometry using mouse anti-human antibodies CD3-phycoerythrin (PE) (BD Biosciences, San Jose, CA). DNA was extracted using QIAamp DNA Mini Kit (Qiagen, Germany) following the manufacturer's instructions, and the DNA integrity was analyzed by agarose gel electrophoresis.
Primer design and multiplex-PCR amplification. The human IGH and TRB sequences were downloaded from IMGT (http://www.imgt.org/) 23 . A relative conserved region in frame region 3 upstream of CDR3 was selected for putative forward primer region. A cluster of primers corresponding to the majority of the V gene sequence family was selected. Similarly, reverse primers corresponding to 6 types of the J gene family were designed. The forward and reverse primers were analyzed by Oligo 7.0 and MFEprimer-2.0 for primer dimer and loop structures. Minor changes of the sequences were made for low-quality primers. The final IGH CDR3 primer sequences are shown in Table 7. The utilized 12 forward primers and 4 reverse primers were used for multiplex PCR to amplify rearranged IGH CDR3 region. Similarly, the utilized 30 forward primers and 13 reverse primers were used for multiplex PCR to amplify rearranged TCR-β CDR3 region (Table 8).  www.nature.com/scientificreports www.nature.com/scientificreports/ After amplification and agarose gel electrophoresis selection, the products were purified using the QIA quick PCR Purification Kit. The final library was quantitated in two ways as follows: by determining the average molecule length using the Agilent 2100 bioanalyzer instrument (Agilent DNA 1000 Reagents) and by real-time quantitative PCR (QPCR) (TaqMan Probe). The libraries were amplified with cBot to generate the cluster on the flow cell, and the amplified flow cell was pair-end sequenced using an Illumina Miseq instrument, with a read length of 150 bp as the most frequently used sequencing strategy.
The PCR conditions were set as 95 °C for 15 minutes, followed by 25 cycles of 94 °C for 15 seconds, 60 °C for 3 minutes, and a final extension for 10 minutes at 72 °C. The PCR products were purified by AMPure XP beads to remove the primer sequences (Beckman Coulter, Germany). A second round of PCR was performed to add a sequencing index to each sample. The PCR condition was set as 98 °C for 1 minutes, followed by 25 cycles of 98 °C for 20 seconds, 65 °C for 30 seconds and 72 °C for 30 seconds, with a final extension for 5 minutes at 72 °C. The library was separated on an agarose gel, and the target region was isolated and cleaned by QIAquick Gel Extraction Kits (Qiagen, Germany). TRBV2F-IND  CAGACGTGTGCTCTTCCGATCTAGATTTCACTCTGAAGATCCGGTCCAC   TRBV9F-IND  CAGACGTGTGCTCTTCCGATCTAGCCTGACTTGCACTCTGAACTAAACCT   TRBV14F-IND  CAGACGTGTGCTCTTCCGATCTAGGGAGGGACGTATTCTACTCTGAAGG   TRBV15F-IND  CAGACGTGTGCTCTTCCGATCTAGTTCTTGACATCCGCTCACCAGG   TRBV19F-IND  CAGACGTGTGCTCTTCCGATCTAGTCCTTTCCTCTCACTGTGACATCGG   TRBV3-1-F4-IND   www.nature.com/scientificreports www.nature.com/scientificreports/ HTS and data analysis. The PCR products were sequenced using an Illumina Genome Analyzer, and the sequencing quality of these read was evaluated by the formula shown below. The quality of the Miseq sequencing ranged from 0 to 40 and was used for filtering out low-quality reads. First, we filtered the raw data, including the adapter contamination. Reads with an average quality score lower than 15 (Illumina 0-41 quality system) were removed, and the proportion of N bases was not more than 5% (sequences with higher values were also removed). Next, a few bases with low quality (lower than 10) were trimmed; the quality score was expected to be over 15 after trimming, and the remaining sequence length was expected to be more than 60 nt. After filtering, the pair-end (PE) read pairs were merged into one contig sequence in two steps as follows: (1) by aligning the tail parts of two sequences and assessing the identity, with at least 10 bases of overlap required and the overlapping section having a 90% base match and (2) since different primers might result in sequences of different lengths, some might be very short (less than 100 bp) and will go through all the bases in the sequence, and thus, such reads were merged by aligning the head part of the sequence. In this way, we obtained the merged contig sequences and the length distribution plot. Subsequently, we used the Tcrip algorithm to perform the sequence analysis. The sequencing reads were mapped to the TCR and BCR sequences downloaded from IMGT/GENE-DB (http:// www.imgt.org). The V, D, and J genes were designated according to the nomenclature provided by the international ImMunoGeneTics information system (IMGT). After alignment, we utilized the following method for the sequence structural analysis: (1) we calculated the number of each nucleotide and analyzed the proportion at each position; (2) according to the last position of the V gene, the start site of the D gene, the end site of the D gene, and the start site of the J gene after alignment, we retrieved the INDEL (insertion and deletion) introduced during V-D-J recombination; and (3) nucleotides were translated into amino acids. According to the identity of each sequence after the alignment, the expression level of each clone was clear and calculated. The expression of each distinct DNA sequence, amino acid sequence and V-J combination was also identified. In addition, to measure the diversity of each sample, we calculated the distinct clone number, Simpson coefficient and Shannon-Waver coefficient based on the different resolutions of the distinct DNA sequences, amino acid sequences, and V-J combinations. The expression level of each sample was also calculated at different resolutions of distinct DNA sequences, amino acid sequences, and V-J combinations. Moreover, we constructed the specific expression draft and plotted a heat-map according to the V-J combination profile. The diversity of the TCR and BCR repertoire was calculated based on the Simpson index of diversity (Ds) 24