Molecular profiling of immunoglobulin heavy-chain gene rearrangements unveils new potential prognostic markers for multiple myeloma patients

Multiple myeloma is a heterogeneous disease whose pathogenesis has not been completely elucidated. Although B-cell receptors play a crucial role in myeloma pathogenesis, the impact of clonal immunoglobulin heavy-chain features in the outcome has not been extensively explored. Here we present the characterization of complete heavy-chain gene rearrangements in 413 myeloma patients treated in Spanish trials, including 113 patients characterized by next-generation sequencing. Compared to the normal B-cell repertoire, gene selection was biased in myeloma, with significant overrepresentation of IGHV3, IGHD2 and IGHD3, as well as IGHJ4 gene groups. Hypermutation was high in our patients (median: 8.8%). Interestingly, regarding patients who are not candidates for transplantation, a high hypermutation rate (≥7%) and the use of IGHD2 and IGHD3 groups were associated with improved prognostic features and longer survival rates in the univariate analyses. Multivariate analysis revealed prolonged progression-free survival rates for patients using IGHD2/IGHD3 groups (HR: 0.552, 95% CI: 0.361−0.845, p = 0.006), as well as prolonged overall survival rates for patients with hypermutation ≥7% (HR: 0.291, 95% CI: 0.137−0.618, p = 0.001). Our results provide new insights into the molecular characterization of multiple myeloma, highlighting the need to evaluate some of these clonal rearrangement characteristics as new potential prognostic markers.


Introduction
B-cell lymphoproliferative disorders are caused by the expansion of a pathological clone at a specific stage of differentiation reflected in the B-cell receptor (BCR). Through multiple, hierarchically structured events, BCR genes from each B cell are assembled in a particular shape that confers an extremely high diversity to the B-cell repertoire to recognize different antigens 1

. IGHV, IGHD
and IGHJ immunoglobulin genes go through recombination during early B-cell ontogeny that takes place in the bone marrow (BM), an entirely random, antigenindependent process driven by the Recombination-Activating Genes (RAG1 and RAG2) 2 and Terminal deoxynucleotidyl Transferase (TdT) 3 . In humans, there are more than 50 IGHV genes, 27 IGHD genes and 6 IGHJ genes. IGHV and IGHD genes are clustered into seven different groups 4 . During B-cell differentiation, one allele is rearranged to produce a complete heavy-chain rearrangement (IGH) with or without the rearrangement of the second IGH allele 5 . This means that one or two IGH rearrangements can be found in each individual B cell, although only one of them is expressed as a functional protein 6 . Immunoglobulin-expressing B cells then migrate to germinal centers to undergo two antigen-dependent processes: somatic hypermutation (SHM) and classswitch recombination (CSR) 7 .
As a clonal B-cell malignancy, multiple myeloma (MM) arises from a single cell, that has been recognized as a post-germinal center, IgA/IgG switched B cell in several studies 8 . VDJH usage, Complementarity-determining Region 3 (CDR3) composition and somatic hypermutation (SHM) levels have been studied for the vast majority of B-cell malignancies, as well as for normal, healthy B cells [9][10][11][12] . The most interesting finding to date is the close association between SHM rate and the outcome in chronic lymphocytic leukemia (CLL), since a high SHM rate (>2%) is associated with good prognosis 13 . Another interesting observation is the presence of stereotyped immunoglobulin receptors, not only in CLL 14 but also in mantle-cell lymphoma (MCL) or marginal zone lymphoma (MZL), which led researchers to infer that this was a common phenomenon for all mature B-cell tumors with potential prognostic and therapeutic implications 15 . Several studies performed with MM patients treated with conventional chemotherapy did not find any correlation between prognosis and VDJH gene usage, CDR3 amino acid composition or SHM rates [16][17][18] . However, most series often included an insufficient number of heterogeneously treated patients, which makes any conclusion difficult to be inferred.
Here, we present the largest-to-date analysis of the VDJH repertoire in multiple myeloma, consisting in biological and clinical data from 413 patients diagnosed and treated according to the Spanish Myeloma Group (GEM-PETHEMA) protocols. Data were used for a comprehensive investigation of VDJH rearrangement characteristics, including IGHV, IGHD and IGHJ gene usage, SHM level and distribution, or complementarity-determining region (CDRs) and framework region (FWRs) length and composition. We also tried to identify the presence of potential clusters of stereotyped receptors, assessing potential relationships between molecular characteristics and clinical outcomes.
According to the International Myeloma Working Group (IMWG) criteria 26 , patients with either t(4;14), t (14;16) or del17p were grouped together as a high-risk (HR) cytogenetics subset for subsequent analysis. Patients with symptomatic myeloma were also stratified based on the Revised International Staging System (R-ISS) 27 .
This study was approved by the Ethical Committee of the University Hospital of Salamanca in accordance with the Spanish law and the Declaration of Helsinki principles. Written informed consent was obtained from every patient prior to the inclusion in each trial.

Sample collection and IGH sequencing
Genomic DNA was isolated from bone marrow aspirates at the time of diagnosis using the automated DNA Purification kit Maxwell® (Promega, Madison, WI, USA). DNA quantification and quality assessment were done using the NanoDrop2000 Spectrophotometer (Thermo-Fisher, Waltham, MA, USA). For the amplification of complete VDJH rearrangements, we used the BIOMED-2 (now Euroclonality©) FR1 primers 28 in multiplexed PCR reactions. All reactions were carried out in a 25 μL mixture containing~100 ng DNA and 10 μmol of each primer. Monoclonal assessment of amplified products was carried out by GeneScanning using 1 μL of PCR reaction. PCR products were sequenced in an automated ABI3500 XL DNA sequencer using Big-Dye terminators v3.1 (Applied Biosystems™, Foster City, CA, USA).
One hundred and thirteen samples were also analyzed by next-generation sequencing (NGS) using the Lym-phoTrack® methodology (Invivoscribe Technologies, San Diego, CA, USA). We targeted the Framework Region 1 to amplify VDJH rearrangements from 113 patients. In one-step PCR, amplicons were generated and indexed. A purification step using Agentcourt AMPure XP microbeads (Beckman Coulter Inc, Brea, CA, USA) was performed; then, we assessed the quality of our amplicons using the TapeStation 4200 (Agilent, Santa Clara, CA, USA) and Qubit 2.0 (ThermoFisher, Waltham, MA, USA). Libraries were sequenced in an MiSeq platform (Illumina, San Diego, CA, USA) with 2 × 251 of read length and aiming for 1 million reads per sample.
For data analysis, the values of all parameters measured per tube were mathematically calculated for the individual plasma-cell events using the merge and calculation functions of the INFINICYT™ v2.0 software (Cytognos S. L. Salamanca, Spain). Plasma cells were identified based on the characteristic pattern of expression of CD38, CD138, CD45 and light scatter features.

Sequence analysis
IGHV, IGHD and IGHJ genes from complete rearrangements detected using either Sanger or NGS were identified comparing them with the IMGT/V-Quest database 29 , taking into account the option for insertions/ deletions. Mutational status was assessed using the closest germline gene. Sequences containing more than 2% deviation from the germline were considered somatically mutated, following previously described criteria 13 . Point mutations were annotated, as well as the length and composition of N-, P-and CDR3 regions.
FastQ files generated by deep sequencing were processed using the LymphoTrackAnalysis® tool to retrieve sequences from virtually every clonal B cell in the samples, allowing the identification of tumor clones if the following criteria were met: (i) ≥20,000 total reads; (ii) at least one but not more than 2 merged top reads ≥ 2.5% of total reads and (iii) top first or second merged reads must be at least twice more abundant than the third most frequent read to be considered clonotypic.
Rearrangements were considered biallelic if the first two reads met the aforementioned criteria, had similar read count (less than 50% difference) and one was productive while the other one unproductive. Moreover, rearrangements were considered potentially biclonal if the first two reads met the aforementioned criteria, had similar read count (less than 50% difference) and both were productive.

CDR3 clustering analysis
ClustalX2.0 was used as previously described 30,31 , comparing CDR3 amino acid sequences obtained from our cohort and 1117 additional sequences obtained from the IMGT/LIGM-DB database 32 corresponding to unique rearrangements from both normal and human tumor B cells.
Primary and secondary standard criteria 31,33 were used as described before to characterize immunoglobulin clusters.

Statistical analysis
Clinical and biological characteristics were analyzed discriminating between transplant-eligible and noneligible patients, as well as asymptomatic patients, with the SPSS 20.0 software (IBM, Armonk, NY, USA) using Fisher's exact test for discrete variables and the Mann−Whitney test for continuous variables. The Kaplan-Meier method and the log-rank test were used to plot and compare progression-free survival (PFS) and overall survival (OS) curves. Cox regression was used to perform the multivariate analysis. Receiver operating characteristic (ROC) curves were used to evaluate potential cutoff values for some variables. PFS was defined as the time from diagnosis to either disease progression, death or the last follow-up visit. OS was defined as the time from diagnosis to the last follow-up visit or decease. All reported p values were obtained by a two-sided exact method, at the conventional 5% significance level (p < 0.05).

Clinical characteristics
Baseline characteristics of the 413 patients included in the study are summarized in Table 1.
For transplant-eligible patients (n = 228), R-ISS stages I, II and III represented 24.1%, 62.8% and 13.1% of this subset, respectively. After induction, before the pretransplant conditioning regimen started, the overall response ratio (ORR) was 92.4%: 15.3% were in stringent complete response (sCR); 24.1% of patients had achieved complete response (CR); and 26.5% of patients were in very good partial response (VGPR). The remaining patients were in partial response (PR, 26.5%). Median PFS was 49.5 months; median OS was 67.9 months.
As expected, high-risk asymptomatic patients (n = 36) were younger (median age: 60 years). Serum albumin levels were higher (median: 3.76 g/dL) and β2 microglobulin levels were lower (median: 2.25 mg/L). Consequently, 61.9% of smoldering patients were at ISS stage I, and 38.1% were at stage II.

Detection of clonal VDJH gene rearrangements
Sanger sequencing allowed the identification of full clonal VDJH gene rearrangements in 88% of patients (363/413). In 23 cases, the clonal IGH gene rearrangement was not detected, possibly due to a high polyclonal background or lack of PCR amplification due to an incorrect primer-annealing. In 12 cases, we could only identify the IGHV gene, while in the remaining cases we could identify two out of three genes (mostly IGHV and IGHJ, but not IGHD, n = 13).
Combining Sanger and NGS, the overall VDJH detection rate was 92.5% (382/413). In all, 388 complete rearrangements were detected, including the identification of five cases with a biallelic rearrangement, as well as one case with a potential biclonal myeloma that was later confirmed by flow cytometry (Fig. S1). Among them, 362 rearrangements (93.3%) were productive, while 26 rearrangements (6.7%) were unproductive (Fig. 1). Six of these unproductive sequences corresponded to light-chain myeloma patients.

Gene repertoire
IGHV, IGHD and IGHJ usage was identified in productive rearrangements. Table 2 shows the frequencies of IGHV gene group usage among the 362 productive rearrangements, with observed and expected frequencies (based on the number of genes per IGHV group), as well as SHM rates. No major differences were observed comparing transplant and nontransplant subgroups.
Compared to previous myeloma series, IGHV gene frequencies in our cohort were similar with three exceptions (IGHV3-20, IGHV4-30-2 and IGHV6-1) that were not found in our patients. We also observed that, while IGHV3 gene group representation was similar between both series, IGHV3-30 was overrepresented and IGHV3-30-3 was underrepresented in our cohort.
IGHD3 and IGHD2 groups were the most abundant (30.4% and 25.4%, respectively), with IGHD3-10 (10%) and IGHD2-21 (8%) genes significantly overrepresented (p < 0.05) as compared to the expected value in healthy plasma cells. Conversely, IGHD4-4 and IGHD6-25 were not found in any sequence, and all members from the IGHD1 gene group but IGHD1-26 seemed to be underrepresented (less than 1.5% each one, Fig. S3 and Table S2). All these features were similar in previous myeloma series and were also seen in other hematologic B-cell malignancies and in normal plasma cells, although preference for IGHD2-21 was specially marked only in myeloma.
For IGHJ gene usage, we found a deep bias in the use of certain genes matching previous observations concerning different hematologic malignancies such as CLL, MCL, and the normal bone marrow compartment: a significant overrepresentation of IGHJ4 (46.4%) and IGHJ6 (25%) was estimated ( Fig. S4 and Table S3), while IGHJ2 and IGHJ1 were clearly underrepresented (2.9% and 1.1%, respectively). However, we noticed that, compared to previous series in myeloma, IGHJ4 was underrepresented and IGHJ6 overrepresented. In addition, IGHJ1 and IGHJ6 usage in healthy plasma cells looked completely different not only with respect to myeloma, but also with other mature B-cell disorders.
We explored associations between VDJH repertoire and risk factors. IGHJ6 gene group was more frequent among patients with high-risk cytogenetics (37.9% vs. 17.8%, p = 0.025) considering only transplant-candidate patients. IGHD1 gene group was also more frequently used by patients harboring high-risk cytogenetics, although this association was restricted to patients not candidates for transplantation (41.7% vs. 13.2%, p = 0.026). However, there were not relevant associations between the usage of these genes and the outcome in terms of PFS or OS.
Transplant-eligible and -ineligible patients showed similar proportions in VDJH gene usage (Table S4). In the same line, high-risk smoldering patients did not show significant differences in the use of VDJH genes compared to symptomatic patients.

Somatic hypermutation rates and IGH composition
Somatic mutations could be correctly identified in 349 sequences: mutational load was high (mean: 9.2% ± 3.8; median: 8.8%; 95% CI: 8.8−9.6%; range: 0−22.7%), with five cases showing more than 98% of homology with germline genes. Comparing SHM rates between IGHV groups, we found a significantly lower rate for IGHV2 The first two columns show the expected IGHV gene usage if all genes were randomly selected compared with the observed frequency within our cohort, respectively. Observed and expected IGHV distributions were not different when the χ 2 test was applied, demonstrating that there is no evidence of selection of specific gene groups in myeloma. Median somatic hypermutation rates per IGHV group are shown on the right side of the table. Below, Kruskal−Wallis-based paired comparisons are listed to determine potential differences in the hypermutation rate; a significantly lower mutation rate was found for patients using IGHV2 compared to those using IGHV1 (p = 0.006), IGHV3 (p = 0.012) and IGHV4 (p = 0.001). 95% CI 95% confidence intervals.
The complete sequence was available in 327/362 cases, so we continued the analysis with this set (Fig. 1). As far as the junction region was concerned, nucleotide addition appeared in 92.5% of cases for the N1 region, and in 88.6% of cases for the N2 region. Only three cases did not show evidence of nucleotide addition, either in N1 or in N2. Mean Guanine/Cytosine content of the N-regions was 58.8%. Evidence of exonuclease activity in at least one of the two junction regions (IGHV-to-IGHD and/or IGHDto-IGHJ) was observed in all rearrangements, although 32.2% of cases displayed a fully conserved IGHV-3′ edge.

CDR3 amino acid composition
According to the IMGT numbering, we identified CDR3 positions (from amino acid 105 to amino acid 117). Median CDR3 length was 15 amino acids (range 6−29); this length was consistent across different myeloma subgroups (symptomatic, asymptomatic, transplant-eligible and transplant-ineligible patients). Amino acid proportions in myeloma were not significantly different from the normal B-cell population (Fig. S5), with predominance of Glycine, Alanine, Aspartic acid and Tyrosine.

Clustering analysis
CDR3 sequences from productive rearrangements (n = 362) were subjected to cluster analysis. We found one pair from our cohort that showed more than 60% of amino acid similarity. Consequently, and given they were clusters not previously described, we applied the secondary criteria. Although CDR3 lengths were similar (13 vs. 15 amino acids) and both rearrangements displayed the same IGHJ4*02 germinal gene, IGHV and IGHD groups, as well as IGHD reading frames, were not concordant. Thus, we could not find any stereotyped immunoglobulin cluster among these myeloma patients.

Impact of molecular variables on clinical outcomes
Considering patients with symptomatic myeloma, after a median follow-up of 7.6 years (range 0.5−24.1), 120 patients had died; the majority of them (n = 76) were transplant-ineligible. ROC curves showed that 7 and 8% were the most optimal cutoffs for prognostic value of SHM; optimal cutoffs for serum albumin (3.2 g/dL), β2 microglobulin (3.5 mg/dL), hemoglobin (9 g/dL) and creatinine (1 mg/dL) were also estimated. In addition, IGHD2 and IGHD3 users were grouped for subsequent analyses, because they were the most common IGHD groups and this bias has been observed in nearly every Bcell malignancy [9][10][11][12][13][14][15][16][17][18] . Finally, all molecular data described in previous paragraphs were combined with clinical data to perform statistical comparisons in transplant-eligible and -ineligible patients.
We first looked for relationships between treatment response and molecular variables. No major associations were seen, but two exceptions: (1) SHM ≥ 7% associated with a higher rate of CR/sCR compared to patients with SHM < 7% in the nontransplant subset (45.8% vs. 27.3%, respectively; p = 0.026), and (2) IGHD3-3 usage was associated with a better response in the transplant subgroup (n = 10), since 80% of MM patients with an IGH rearrangement choosing this particular gene achieved CR/ sCR vs. 36.8% for other IGHD genes (p = 0.019).
One hundred and eleven transplant-ineligible patients were included in the multivariate analysis. For PFS (Table 3) Moreover, SHM rate was included in the multivariate regression model not only as a discrete variable but also as a continuous one to avoid potential bias induced with the cutoff selection of 7%; in this case, the variable retained its prognostic value (p = 0.017, HR: 0.909, 95% CI: 0.840−0.983) along with achieving CR/sCR and the absence of cytogenetic risk, showing that an increment of 1% in SHM reduces death risk by 9%.

Discussion
Here, we analyzed in depth the IGH locus, trying to correlate its characteristics with clinical data in a large cohort of 413 MM patients included in clinical trials. VDJH usage description provided new insights into the clonal differentiation and possible clinical implications for MM, that could potentially improve patient care.
IGHV gene repertoire reflected its normal counterpart, with IGHV3 being the predominant gene group. As in other series, IGHV3 was overrepresented and IGHV1 underrepresented, although differences were not significant, with a much less pronounced bias than that observed in CLL 12,13,31 . Similarly, IGHV3-30, IGHV4-59 and IGHV4-39 were overrepresented with respect to other IGHV genes [34][35][36] , and although these genes are also common in other lymphoproliferative disorders and normal plasma cells, we noted that IGHV3-30 was specifically selected only in myeloma. In contrast, IGHV3-23 and IGHV1-69 appeared to be less frequently selected in MM (≥8% of representation in the normal B-cell population 35,36 and Bcell pathologies vs. ≤5% in our series).
IGHV4-34 representation is high in hematologic malignancies such as B-ALL, CLL, MCL or diffuse large B-cell lymphoma as well as in normal B cells 9,11,31 . Interestingly, this gene is usually absent in the normal plasma-cell population 37 , although it shows a wide interindividual usage frequency 38 , and it is rarely selected in MM 17 , but in this work it was not completely absent. IGHV4-34 has been characterized as an inherently autoreactive antibody 39,40 for it is widely represented in autoimmune diseases (i.e. systemic lupus erythematosus). Here, we found four clonal IGHV4-34 rearrangements; they showed very high SHM rates and were usually nonproductive, leading to light-chain myelomas. This could be explained by the fact that during plasma-cell differentiation, those cells displaying autoreactive-prone genes within their IGH rearrangements would be specially targeted by the somatic hypermutation machinery, trying to make them able to recognize only foreign antigens. Moreover, in our series, two out of the three unproductive rearrangements using VH1-69, another gene commonly associated with autoimmune events, such as Thrombotic Thrombocytopenic Purpura 41 , and highly represented in CLL, were light-chain myelomas with some of the highest SHM rates in our cohort (mean 12.23%). Interestingly, this gene is commonly unmutated in CLL patients 42,43 . SHM level was high and similar to what has been observed in other studies for healthy and pathological plasma and memory B cells. This would be in line with the post-germinal center origin of myeloma 8 . A striking finding was the presence of five MM cases with no apparent SHM; however, the IGHV region could not be completely sequenced (the entire FWR1 and part of the FWR2 was missing); from our experience, we think the mutational load was most likely underestimated, rather than being truly unmutated cases.
CDR3 composition resembled the normal immunoglobulin repertoire, with similar length, amino acid use, or evidence of TdT and exonuclease activity 44 . This composition diverges from CLL, MCL or MZL patterns, where restricted antibody sets account for approximately 30% of cases [45][46][47] . Accordingly, cluster analysis failed to demonstrate the presence of stereotyped receptors in our cohort, as it has been shown before in myeloma patients 48 . Overall, the gene repertoire in our series was very close to previous reports 17,18,48 , although functional IGHV and IGHD genes, whose use has not been reported in myeloma to date, have been found in this cohort probably due to its larger size. On the contrary, other genes such as IGHV1-18, IGHV3-20, IGHV4-30-2 and IGHV6-1, found in Italian 48 or Greek 18 patients, are absent in Spanish patients, maybe reflecting differences in the normal repertoire between different Mediterranean subpopulations. No differences were observed in the median length of the CDR3 region, and somatic hypermutation was slightly higher than in previous publications; however, this variation was not statistically significant.
The most remarkable finding was the association between a higher SHM level and an improved survival rate. The explanation for this situation seems to be connected with the maturation stage of tumor cells, represented by the expression of CD19 and CD81 biomarkers, which is consistent with previous findings regarding the immune profile of tumor plasma cells: 49 it suggests that early genetic alterations in the ontogeny of germinal-center B cells would provide a more aggressive profile in MM, which would be strong enough to overcome SHM and/or CSR earlier.
Conversely, the acquisition of late, more harmless genetic events would allow tumor B cells to undergo SHM and CSR processes and this would lead to better outcomes. Another possible explanation is related to the proliferation ratio of tumor cells: highly proliferative B cells (and therefore, more aggressive clones) activate DNA repair pathways that are able to reduce the SHM level, as it has been shown in CLL patients 50,51 . However, the SHM cutoff is based on FR1 primers; this threshold could differ when using leader primers.
Another striking feature was the correlation between the use of IGHD2 and IGHD3 gene groups and the outcome. An increased usage of these particular groups has also been observed in other series suggesting a positive selective pressure over them 8,17,18 , since the bias was observed only for complete rearrangements. Nevertheless, whether or not the relationship between IGHD usage and survival is a myeloma characteristic remains uncertain.
SHM and IGHD usage were associated with the outcome only in the elderly, transplant-ineligible patients. Thereby, the effect of intensive regimens could be the main reason to explain why the negative prognostic impact of these molecular characteristics is diluted in younger, fit patients.
Regarding the value of clinical variables on the patient outcome, the prognostic role of cytogenetics, hemoglobin, albumin, β2 microglobulin and ECOG in MM was confirmed 52 . Cytogenetic risk has to be mentioned as this variable showed significant associations with survival in all cases but when applied to PFS in transplant-ineligible patients. The absence of FISH studies for most patients diagnosed from 1995 to 2000 (most of them being older) made possible to find only 17 patients with high-risk cytogenetics in this subgroup, which could explain this result.
In conclusion, we have shown that multiple myeloma plasma cells resemble the normal mature B-cell repertoire but, in contrast to CLL, MCL or MZL, stereotyped receptors were absent 14,15,53,54 . Light-chain and nonsecretory myeloma cases could be explained by incorrect IGH rearrangements at the genomic DNA level, but not always, which means that other effects could hamper the immunoglobulin production, including transcriptional or translational defects. Finally, here, and for the first time, we have reported how some molecular characteristics of the IGH gene rearrangement, such as higher SHM rates or IGHD2/IGHD3 gene usage, could positively impact on the outcome of patients. A validation series is currently planned to elucidate whether these findings could be applicable in real-life patients; if validated in independent studies, they could be considered as new molecular markers for PFS and OS in multiple myeloma.
de Hematología y Hemoterapia (FEHH, co-funded by Fundación Cris in the latter case), A.M. by the European Social Fund and the Spanish Education Council through the University of Salamanca, and M.E.S. by the ISCIII (CPII18/ 00028). All Spanish funding is co-sponsored by the European Union FEDER program. The authors wish to thank José Pérez, Alicia Antón, Montserrat Hernández-Ruano, Estrella Arnés, Rebeca Maldonado, Mercedes Jiménez, Alejandra Martín, Isabel Sánchez, Rocío Corral and Francisco Boix (University Hospital of Salamanca, Spain) for their technical support, as well as the Spanish Multiple Myeloma "GEM-PETHEMA" group for providing clinical data. The authors are also very grateful to the patients who participated in this study.
Author details 1