High-throughput sequencing of IgG B-cell receptors reveals frequent usage of the rearranged IGHV4–28/IGHJ4 gene in primary immune thrombocytopenia

Primary immune thrombocytopenia (ITP) is an acquired form of thrombocytopenia caused by IgG anti-platelet autoantibodies and represents an organ-specific autoimmune disorder. Although the glycoprotein (GP)IIb/IIIa and GPIb/IX have been shown to be targets for autoantibodies, the antigen specificity of autoantibodies is not fully elucidated. To identify the characteristics of IgG B-cell receptor (BCR) repertoires in ITP, we took advantage of adaptor-ligation PCR and high-throughput DNA sequencing methods for analyzing the clone-based repertoires of IgG-expressing peripheral blood B cells. A total of 2,009,943 in-frame and 315,469 unique reads for IGH (immunoglobulin heavy) were obtained from twenty blood samples. Comparison of the IGHV repertoires between patients and controls revealed an increased usage of IGHV4–28 in ITP patients. One hundred eighty-six distinct IGHV4–28-carrying sequences were identified in ITP patients and the majority of these clones used an IGHJ4 segment. The IGHV4–28/IGHJ4-carrying B-cell clones were found in all ITP patients. Oligoclonal expansions of IGHV4–28/IGHJ4-carrying B cells were accompanied by multiple related clones with single amino substitution in the CDR3 region suggesting somatic hypermutation. Taken together, the expansion of IGHV4–28/IGHJ4-carrying IgG-expressing B cells in ITP may be the result of certain antigenic pressure and may provide a clue for the immune pathophysiology of ITP.

High-throughput sequencing of BCR genes have revealed the landscape and longitudinal changes of B-cell repertoires and have identified clonal expansions [11][12][13][14][15][16][17][18] . Recently, Kitaura et al. have developed a new BCR repertoire analysis methods comprised of adaptor-ligation polymerase chain reaction (PCR) and next-generation sequencing, which enables the comprehensive quantitative analysis of BCRs at a clonal level 19 . Somatic hypermutation among antibody subclasses can be easily disclosed by this method. Taking advantage of this novel method, we investigated the repertoires of IgG-BCRs of peripheral blood B cells from ITP patients in order to identify the characteristics of IgG-BCR repertoires in this disorder, and were able to find the oligoclonal expansions of IGHV4-28/IGHJ4-carrying IgG-expressing B cells with small clonal sizes.

IGHV repertoires of IgG BCRs in primary ITP.
A total of 2,009,943 in-frame and 315,469 unique reads were obtained from twenty blood samples, and 29,049 to 160, 013 reads (100,497 reads in average) from each sample. The global usage of IGHV, IGHD, and IGHJ segments were not different between the patients and controls (Fig. 1). Patient characteristics are described in Supplementary Table 1. The mean values of IGHV1-24 and IGVD3-3 were much higher in ITP than those in control and this was the presence of one outlier for the ITP cohort. In this particular ITP patient, the expansion of IGHV1-24-carrying B-cell clones was detected, although its clinical significance was not clear. In other ten ITP patients, the IGHV1-24 subfamily comprised less than 1% of total B-cell repertoire. However, we found significantly increased usage of IGHV4-28 (0.053% vs. 0.005%, p = 0.006) and less usage of IGHV3-15 (1.28% vs. 3.63%, p = 0.04) in ITP patients ( Fig. 2A). Diversity indices of Simpson and Pielou were not statistically different between the two groups, but the Shannon scores were slightly higher in ITP patients (Fig. 2B). The total numbers of in-frame reads in ITP and control were similar, but the total unique reads in ITP were higher. Thus, the richness in B-cell clones in the ITP patient cohort might have affected the difference in Shannon diversity scores. B-cell clones carrying IGHV4-48 in primary ITP. There were 186 and 34 distinct IGHV4-28-carrying sequences identified in ITP patients and controls, respectively (Fig. 3). One-hundred twenty-six of 186 IGHV4-28-carrying clones (67.7%) used the IGHJ4 gene segment in patients with ITP, and 9 of 34 IGHV4-28-carrying The mean values of IGHV1-24 and IGVD3-3 were much higher in ITP than those in control that was due to the presence of one outlier for the ITP cohort. There was no significant difference in the usages of IGHV, IGHD, and IGHJ between the patients and controls except the IGHV4-28 and IGHV3-15 segments. were not statistically different between the two groups (p = 0.303 and 0.095, respectively). The Shannon scores were slightly higher in ITP patients (p = 0.031). The total numbers of in-frame reads in ITP and control were 885,793 and 1,124,150, respectively, and the total unique reads in ITP and control were 215,640 and 99,829, respectively. Thus, the richness in B-cell clones was even higher in the ITP group than the control group of this cohort. www.nature.com/scientificreports www.nature.com/scientificreports/ clones (26.5%) used the IGHJ4 gene segment in control donors. There were neither shared public clones among ITP patients nor between the ITP and the control groups. Because somatic hypermutation in non-CDR3 regions conferred some conflicts on an assignment of IGHV segments in IGHV4 subfamilies, we compared the sequence reads of V, D, J segments and CDR3 of certain clones and considered them as using the same IGHV gene if they shared the same CDR3 sequences and IGHJ genes.
Deduced amino acid sequences of the CDR3 region of overexpressed IgG-BCR with IGHV4-28. We examined the structures of the CDR3 region of IgG-BCRs expressed on IGHV4-28-carrying B-cell clones. The distribution of CDR3 lengths of 186 distinct clones in ITP were compared to that of 34 clones in control donors (Fig. 4A). Previously, Kitaura et al. reported that the CDR3 length of all IgG subclasses in healthy individuals showed a Gaussian-like distribution with a median length of 18 amino acids 19 . In the present ITP patient cohort, the CDR3 length distribution was skewed with an additional peak at 13 and 14 amino acids. This was supported by the difference in distribution parameters between control and ITP (supplementary Table 2). Glycine (G) and lysine (K) were more frequently used in these short CDR3 sequences of IGHV4-28-carrying BCRs than those in all IGVH4-28-using BCRs in ITP and controls ( Fig. 4B-D, Supplementary Tables 3 and 4). Five IGVH4-28 clones with a 13/14 amino acid length CDR3 were detected in control donors, but lysine was not used in those clones (Supplementary dataset for Fig. 3). There was no apparent sequence homology in the CDR3 region among patients (Table 1).
Somatic hypermutation in the CDR3 region of IGHV4-28/IGHJ4 carrying IgG-BCR. To test whether the expansion of IgG BCR using a IGHV4-28 segment was the result of certain antigenic pressures, we analyzed the deduced amino acid sequences of the CDR3 region of all IGHV4-28/IGHJ4 BCR in the same individual. As shown in Table 2, twenty-one distinct B-cell clones with the rearranged IGHV4-28/IGHJ4 gene were identified in this patient (patient number AHK03G), and clone #33 was the most prevalent clone ( Table 2). Among B-cell clones with the same CDR sequence, there were several clones with distinct non-CDR3 sequences (clone #34, 48, 33,36,38,35,63). Fourteen other clones had CDR3 regions with a single amino acid substitution compared to that of clone #33. This oligoclonal expansion of particular B cells with IGHV4-28/IGHJ4-carrying IgG-BCR and the presence of multiple related clones with a single amino substitution in the CDR3 region were observed in two other patients AHK07G and AHK02G (Supplementary Tables 5 and 6). These findings strongly suggest that the expansion of B cells with IGHV4-28/IGHJ4-carrying IgG BCR may be the result of antigenic pressure leading to somatic hypermutation of CDR3. www.nature.com/scientificreports www.nature.com/scientificreports/ Detection of somatic hypermutation in the CDR3 region of IgG-BCR following vaccination. To validate the BCR repertoire analysis in terms of detecting the clonal proliferation of B cells and accompanying somatic hypermutation after antigenic stimulation, we investigated the IGH repertoire following the vaccination of seasonal influenza in healthy volunteer donors. Blood samples were taken on days 0, 7, and 28 following vaccination with the sampling schedule determined according to previous reports 12,20 . Wu et al. reported that the influenza vaccine induced changes in the IGHV repertoire in human peripheral blood by day 7. Thus, we selected IGHV subfamilies that showed a transient increase on day 7 following influenza vaccination in healthy donors. The healthy donor AHK13G had increased usage of IGHV1-18 and IGHV3-15 segments (Supplementary Fig. 1). High-throughput sequencing of IGH enabled us to identify the clonal expansion of the previously present B-cell clone carrying IGHV1-18/IGHJ5 following vaccination, and clonal expansion was accompanied by somatic hypermutation of the CDR3 region ( Table 3). The expanded B-cell clones decreased by day 28. We also confirmed a transient clonal proliferation of B cells carrying IGHV3-15/IGHJ4 accompanied by somatic hypermutation of CDR3 (data not shown). The similar results were confirmed in another healthy volunteer donor AHK12G who received flu vaccination (Supplementary Table 7). However, we could not determine whether an oligoclonal expansion of B-cell clones accompanied by multiple related clones following flu vaccination was just the result of an expansion of pre-existing clones or a clonal response to flu vaccination accompanying affinity maturation. These findings demonstrate that high-throughput BCR repertoire analysis can identify antigen-driven clonal proliferation of B cells, although we do not show antigen specificity of the expanded B cells.

Discussion
We found preferential usage of the rearranged IGHV4-28/IGHJ4 gene in circulating IgG B cells in ITP. Oligoclonal proliferation of B cells has previously been reported, but the structural information of BCRs in primary ITP was unclear 21 . Roark et al. previously reported the usage of IGHV3-30 by the anti-platelet immunoglobulin from two patients with ITP using Fab/phage display libraries 22 . We did not find a difference in the usage of IGHV3-30 between ITP patients and control donors (Fig. 1). These conflicting results may be explained by the limited number of patients examined or different HLA backgrounds. Although the antigen specificity of BCRs encoded by the IGHV4-28/IGHJ4 gene in the present patient cohort remains to be elucidated, our findings may contribute to the development of genetic testing for the diagnosis of ITP. Kuwana et al. previously reported that the mean number of anti-GP IIb/IIIa antibody-secreting B cells in peripheral blood was 8.2 cells per 10 5 mononuclear cells in primary ITP patients. We detected 0.053% www.nature.com/scientificreports www.nature.com/scientificreports/ IGHV4-28-bearing B cells on average from the total population of IgG-expressing B cells in ITP. This figure is much higher than the frequency of anti-GP IIb/IIIa antibody-secreting B cells in blood mononuclear cells, but this may be explained by the fact that our assay was based on RNA expression and not on the size of the clones. Another possible explanation could be that the anti-GP IIb/IIIa antibody represents the pathogenic autoantibodies in ITP.
We found a skewed distribution of CDR3 lengths in IGHV4-28-carrying BCRs in ITP, and glycine and lysine were more frequently used in the short CDR3 regions. In general, glycine is found at the surface of proteins with loop regions and does not have a side chain, thereby providing high flexibility to the polypeptide chains at these locations. Lysine is a positively charged amino acid. This amino acid profile may reflect antigenic selection and affinity maturation.
The CDR regions are generally responsible for antigen recognition of immunoglobulins. In addition, some non-CDR regions are also responsible for antigen binding of antibodies 23 . Some residues of framework regions may comprise a part of the binding site, and others may provide structural support to CDR regions, thereby affecting antigen binding. We could not find any public clones using an IGHV4-28 segment. Our data have encouraged us to test the hypothesis that preferential usage of the rearranged IGHV4-28/IGHJ4 gene by circulating IgG-expressing B cells may be a potential diagnostic marker for primary ITP using a large patient cohort.
Limitation of this study could be that it remains unclear what the relevance of the usage of the rearranged IGHV4-28/IGHJ4 gene is for the pathogenesis of primary ITP in adults. One potential mechanism may be age because long-term antigenic stimulation may shape BCR repertoire. However, this explanation does not seem likely because there was no significant difference in age distribution between the patient cohort and control groups. The second potential explanation may be that the IGHV4-28 segment is a non-core gene, which is seen in some but not all individuals 24 . However, this possibility is also unlikely because all nine control donors and ITP patients had IGHV4-28-bearing B cells. Developing anti-idiotype antibody for the immunoglobulins reacting with platelets may solve this missing link between the frequent usage of the rearranged IGHV4-28/IGHJ4 and immune pathophysiology of ITP. There are also some technical limitations in this study. We could not differentiate the CDR3 sequences in small numbers of reads from a PCR error. This issue needs to be solved by another technology in the near future.
Some IGHV genes are expressed less frequently and the IGHV4-28 gene is one of those less frequently used genes 24 . The new BCR repertoire analysis described here comprised of adaptor-ligation PCR and next-generation sequencing could detect the oligoclonal expansions of B cells with small clonal sizes. Thanks to this method, we were able to detect the increased expression of this noncore gene by circulating B cells in ITP patients. To our knowledge, employing this strategy to characterize the immune receptor repertoires in ITP patients has never been reported.
The IGHV4-28/IGHJ4-carrying clones mostly used an IGHG1 or IGHG2 gene segment as constant regions. Human IgG is divided into 4 subclasses with different heavy chains and each of them has its own functional properties. The IgG1 is the most prominent immunoglobulin in human sera and the IgG2 generally comprises 20 to 25% of total IgG. In human, IgG2 dominates in response to thymus-independent antigens such as pneumococcal polysaccharides 27 . Thymus-independent antigens can activate splenic marginal zone B cells. With regard to the IgG subclasses of anti-platelet autoantibodies, Chen et al. have previously reported that IgG1 was the most common isotype for platelet-associated autoantibodies directed against glycoprotein IIb/IIIa either alone or with other IgG subclass antibodies 28 . They also observed that some ITP patients had only IgG2 autoantibodies against IIb/IIIa. Taken together, ITP appears to be a heterogeneous disorder caused by anti-platelet autoantibodies that include a various patterns of IgG subclasses. Therefore, both thymus-dependent and thymus-independent antigens could contribute to the expansion of IGHV4-28/IGHJ4-carrying B cells in the present patient cohort.
In summary, we found that circulating IgG-expressing B cells preferentially used transcripts of a rearranged IGHV4-28/IGHJ4 gene in primary ITP, and that clonal expansion of IGHV4-28/IGHJ4-carrying B cells was accompanied by the somatic hypermutation in the CDR3 region. Although the responsible antigens remain unknown, these findings suggest that the expansion of IGHV4-28/IGHJ4-carrying B cells might be the result of antigenic pressure in ITP.
Samples and RNA extraction. Ten mL of heparinized peripheral blood was taken from each patient, and mononuclear cells (PBMCs) were isolated by density gradient centrifugation. Total RNA extraction and cDNA synthesis were performed by standard protocols. Total RNA was isolated from the PBMCs and purified with RNeasy Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. The amount and purity of RNA were measured using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA).
Unbiased amplification and high-throughput sequencing of IgG-BCRs. Unbiased amplification of IgG-BCRs was performed by adaptor-ligation PCR, and the amplicons were sequenced by next-generation sequencing 19,34 . One hundred nanograms of total RNA was converted to cDNA. A specific primer BSL-18E containing polyT18 and a NotI site was used for cDNA synthesis 35,36 . Following the synthesis of dsDNA, a specific adaptor P10EA/P20EA was ligated to the blunted end of dsDNA, and then adaptor-ligated dsDNA was subjected to digestion with the NotI restriction enzyme. After clean up, PCR was performed with primer pairs specific for the constant region and the P20EA sequences. The second PCR was performed with a constant region-specific nested primer and a P20EA-specific primer. Primer sequences were previously reported 19 . After amplification, index sequences were added using a Nextera XT Index Kit v2 SetA (Illumina, San Diego, CA, USA). The indexed amplicons were mixed in an equimolar concentration, and the mixtures were subjected to next-generation sequencing using the Illumina Miseq paired-end platform (2 × 300 bp). Theoretically, total RNAs were extracted from approximately 10 5 to 10 6 B cells in each donor, although flow cytometric analysis was not routinely performed in this experiment. The sequencing depth was determined by the scale of samples and also in attempt to reduce the errors by too much deeper sequencing 37 . The cost of sequencing was an additional reason. Thus, the suitable sequencing depth was pre-determined as being approximately 100,000 reads per sample.

Data analyses.
Each sequence read was analyzed by the bioinformatics software created by Repertoire Genesis Incorporation (Ibaraki, Japan), and the usage of IGHV, IGHD, IGHJ, IGHC, and CDR3 (complementarity determining region 3) sequences were determined according to methods previously reported 19,34 . Briefly, the identification of V, D, J, and C regions were determined by identifying the sequence with the highest identity to reference sequence data sets available from the international ImMunoGeneTics information system (IMGT) database (http://www.imgt.org). The data processing, assignment, and data aggregation were automatically performed. The identical V, D, J, and deduced amino acid sequences of CDR3 were defined as a unique sequence read. A unique sequence read contained several variant sequences formed by somatic hypermutations, and thus we considered sequence reads sharing identical V, D, J segments and identical amino acid sequences of CDR3 as an identical clonal lineage. The Repertoire Genesis software automatically counted the numbers of unique sequence reads and ranked them in order of copy number.
Repertoire diversity. To estimate the repertoire diversity, we calculated the indices of Shannon diversity, Simpson richness, and Pielou evenness using the R program 38 . The Shannon index was normalized by dividing with the logarithm of the total number of unique reads.
Statistics. Statistical significance was tested with an independent sample t-test for comparing the mean values for two groups combined with the Levene's test for equality of variances. Statistical analyses were performed using IBM SPSS version 23 software. Data Availability. For original data, please contact mhirokawa@hos.akita-u.ac.jp.