High expression of HMGA2 independently predicts poor clinical outcomes in acute myeloid leukemia

In acute myeloid leukemia (AML), risk stratification based on cytogenetics and mutation profiling is essential but remains insufficient to select the optimal therapy. Accurate biomarkers are needed to improve prognostic assessment. We analyzed RNA sequencing and survival data of 430 AML patients and identified HMGA2 as a novel prognostic marker. We validated a quantitative PCR test to study the association of HMGA2 expression with clinical outcomes in 358 AML samples. In this training cohort, HMGA2 was highly expressed in 22.3% of AML, mostly in patients with intermediate or adverse cytogenetics. High expression levels of HMGA2 (H + ) were associated with a lower frequency of complete remission (58.8% vs 83.4%, P < 0.001), worse 3-year overall survival (OS, 13.2% vs 43.5%, P < 0.001) and relapse-free survival (RFS, 10.8% vs 44.2%, P < 0.001). A positive HMGA2 test also identified a subgroup of patients unresponsive to standard treatments. Multivariable analyses showed that H + was independently associated with significantly worse OS and RFS, including in the intermediate cytogenetic risk category. These associations were confirmed in a validation cohort of 260 patient samples from the UK NCRI AML17 trial. The HMGA2 test could be implemented in clinical trials developing novel therapeutic strategies for high-risk AML.


Introduction
In adult acute myeloid leukemia (AML), clinical outcome is predicted by age, cytogenetics and specific gene mutations. [1][2][3][4][5] In the recent European LeukemiaNet (ELN) guidelines for AML genetic testing, screening for mutations in NPM1, CEBPA, RUNX1, FLT3, TP53, and ASXL1 genes in addition to chromosomal anomalies is recommended. 1 It is now well accepted that the genetic and cytogenetic risk stratification guides AML consolidation therapy: patients in a favorable risk category are treated with conventional consolidation chemotherapy, whereas adverse-risk patients are usually referred for allogeneic hematopoietic stem cell transplantation (allo-HSCT), a procedure carrying an inherent mortality rate surpassing 15%. 6 However, the ideal consolidation therapy remains unclear for up to 40% of AML patients classified in the intermediate-risk category, hence the need to improve prognostic assessment in this patient subgroup. 1 Likewise, identification of possible long-term survivors in the adverse-risk group represents another clinical challenge.
Gene expression signatures, mostly derived from microarray studies, have been evaluated as a means to further improve AML risk stratification. [7][8][9][10][11][12][13][14] Although several markers have been identified, they have not been widely adopted because of technical challenges in implementing large gene signatures in clinical settings. Global RNA-sequencing technologies, which are more accurate in estimating gene expression levels than microarray studies, 15 have now been applied to a few large AML cohorts including that of The Cancer Genome Atlas (TCGA, n = 179) 16 and Leucegene (n = 430). [17][18][19][20][21] These data sets provide new opportunities to determine whether candidate gene expression levels can complement currently accepted prognostic tests.
In this study, we have explored the Leucegene data set using bioinformatic tools to identify genes with bimodal expression patterns that correlate with patient survival. The two best candidate genes, High Mobility Group AT-Hook 2 (HMGA2) and Pro-Apoptotic WT1 Regulator (PAWR) were evaluated in the training cohort but only HMGA2 was validated in the independent cohort. We present the development and inter-laboratory validation of a RT-qPCR HMGA2 clinical test and demonstrate its utility to refine AML risk stratification.

Study design, patients, and AML sample characteristics
This study is part of the Leucegene project and was approved by the Research Ethics Boards of Université de Montréal and Maisonneuve-Rosemont Hospital. Diagnostic AML samples and clinical data were collected with informed consent from patients between 2002 and 2014 at nine hospitals participating in the Banque de cellules leucémiques du Québec program (BCLQ, bclq.org). The Leucegene full cohort of 430 RNA-sequenced samples ( Fig. 1) was used for the discovery of new candidate prognostic markers. RNA-sequencing data are available separately, [17][18][19][20][21] #GSE49642, #GSE52656, #GSE62190, #GSE66917, #GSE67039. The training cohort includes 263 de novo AML patients treated with intensive regimens sequenced in the Leucegene project and 95 additional BCLQ specimens similarly selected, which were not sequenced (Fig. 1). The median follow-up was 6.0 years. Alive patients were censored at their last follow-up (May to August 2015). Four additional patients were censored owing to loss to follow-up. Definitions of complete remission (CR), overall survival (OS), relapse-free survival (RFS) and cumulative incidence of relapse (CIR) followed ELN recommendations. 1 Description of clinical characteristics and treatment protocols are provided in the Supplementary Information (Supplementary Figures S1-S2; Supplementary Tables S1-S4). AML samples (n = 70) from Australia were used to confirm the distribution of HMGA2 expression values ( Fig. 1 Fig. 1 and Table 1). Patients with intermediate-and adverse-risk cytogenetics were selected for external validation because the HMGA2 test appears useful in these risk categories. HMGA2 expression values were not available for 3 out of 263 samples.
Cytogenetics, mutation analysis, and RNA sequencing Cytogenetic risk was categorized according to ELN recommendations. 1 Methods for leukemia cell cryopreservation and for NPM1, FLT3-ITD, and CEBPA mutation testing are described in the Supplementary Information. The workflow for RNA-sequencing and mutation analysis has been described previously. 20

Quantitative PCR experiments
A RT-qPCR assay to evaluate HMGA2 expression was developed. Detailed methods including complementary DNA synthesis, primer, and probe sequences, PCR, construction of plasmid standard curves and results of analytical validation are outlined in the Supplementary Information (Methods section, Supplementary Tables S6  and S7). Normalized copy numbers (NCN) of HMGA2 were generated following Europe Against Cancer program recommendations. 22

Statistical methods
Receiver operating characteristic (ROC) curves and the Youden index were used to identify a threshold between low and high HMGA2 expression values. 23,24 Fisher's exact test was used to test bivariate unadjusted associations between the marker, dichotomized as above (H + ) vs. below (H −) the threshold, and categorical variables. Probabilities of OS were estimated with Kaplan-Meier curves and compared using the log-rank test. CIR curves were estimated using competing risks analyses to account for mortality and compared with Gray's test. 25 OS was measured from the date of AML diagnosis and RFS and CIR were measured from the date of achievement of a remission. For studies in the subgroup of younger transplanted patients, time 0 was defined as the date of transplantation. Main analyses relied on multivariable regression methods to estimate the associations of the dichotomized marker with each of the clinically relevant outcomes. Multivariable models were adjusted for the following set of established prognostic variables: age, white blood cell (WBC) counts, HSCT as a timedependent variable (except for CR prediction), cytogenetic risk and NPM1 and FLT3-ITD mutation status. TP53, RUNX1, and ASXL1 mutations were added as variables for models in the sequenced cohort and biallelic CEBPA, RUNX1, and ASXL1 mutations for models in the intermediate cytogenetic risk subgroup. The effect of age was modeled using the linear and quadratic terms, to account for its significantly non-linear relationships with most of the outcomes (Supplementary Information, Statistical Methods section). The ability of HMGA2 to enhance CR prediction was assessed with multivariable logistic regression and its independent association with the time to relapse and/or death was estimated by multivariable Cox proportional hazards regression. Flexible time-dependent model was used to test the proportional hazards assumption 26

Identification of HMGA2 as a new prognostic marker in AML
We first investigated all annotated genes in the Leucegene full cohort (n = 430) for their potential to discriminate between patients with good vs. poor survival by analyzing survival based on the 75th percentile of expression values (Fig. 1). The best candidate prognostic markers were also selected for features that would ease their usage as clinical tests: (1) high dynamic range of expression; (2) evidence for bimodal distribution illustrative of two distinct subgroups with more than tenfold difference in reads per kilobase per million mapped reads (RPKM) values between low and high expressors, and (3) peak expression in high expressors above one RPKM. HMGA2 and PAWR were identified for test development and validation but only HMGA2 was validated in the independent NCRI AML17 validation cohort and is reported herein. Analyses of PAWR in the validation     cohort are provided in Supplementary Figure S3 and Supplementary Table S9.
Notably, based on the 75th percentile of HMGA2 expression values, most genetic anomalies associated with poor survival were highly prevalent in the HMGA2 positive subgroup including samples with complex karyotype, TP53 mutations or other adverse-risk mutations such as RUNX1, ASXL1, SRSF2, and MLL (Supplementary Figure  S4).

Development and validation of the HMGA2 RT-qPCR test
We developed and validated a HMGA2 RT-qPCR test in three independent AML patient cohorts (Leucegene, NCRI AML17 and Australian cohorts) and confirmed the bimodal expression pattern of HMGA2 ( Fig. 1 and Supplementary Figure S5). We observed a high correlation between these results and those found by RNA sequencing or droplet digital PCR as well as a large range of expression values (Supplementary Figure S6; Fig. 2 upper panel). Using ROC curves, the cutoff for the RT-qPCR test was optimized and established at 1100 NCN in the training cohort. 23 Samples with expression levels ≥ 1100 NCN are hereafter referred to as H+ and those with expression levels < 1100 NCN as H−.
In the training cohort, the HMGA2 test showed high reproducibility, robustness, and specificity (Supplementary Table S7). Inter-laboratory test validation was performed at the King's College University of London laboratory, using 263 AML samples from patients of the NCRI AML17 trial.

HMGA2 test also adds prognostic value in the 2017 ELN adverse-risk category
We next studied whether the HMGA2 test could improve prognostic assessment in AML patients classified according to the 2017 ELN genetic risk stratification. 1 We found that 45 out of 87 ELN adverse-risk patients (51.7%) ( Table 1, Supplementary Table S14) were positive for the HMGA2 test and had a significantly worse survival (Supplementary Figure S9, right panel, red curve). In this patient subgroup (ELN adverse-H + ), representing 12.6% of the entire AML training cohort, no patients were longterm survivors. In contrast, the survival of H− patients classified as adverse risk by the ELN risk stratification was similar to that of ELN intermediate-risk patients (Supplementary Figure S9, right panel, yellow and green curves). Importantly, among the 45 H + patients, eight samples harbored mutations in RUNX1 and/or ASXL1 genes (intermediate-risk cytogenetics) and 15 had mutations in TP53 (Supplementary Table S14). This finding is clinically relevant, especially if screening for these poor risk mutations is not readily available.

HMGA2 test validation in the NCRI AML17 cohort
To validate the ability of HMGA2 expression to enhance risk stratification in an independent cohort, the prognostic value of H + was assessed in the UK NCRI AML17 cohort using the same RT-qPCR assay and cutoff (Table  1). Consistent with our findings, H + was a strong predictor of a lower frequency of CR (70% vs 85.6%, P = aOR adjusted odds ratio, CI confidence intervals, HMGA2+ high expression (≥1100 NCN), HMGA2− low expression (<1100 NCN), ITD internal tandem duplication, WBC white blood cells (× 10 9 /l) As the non-linear effect of age at diagnosis is represented jointly by the two coefficients (linear and quadratic), the interpretation of each coefficient separately is not appropriate. See statistical methods (Supplementary Information) for description of the adjusted effect of age at diagnosis 0.002, Table 1), poor survival (5-year OS: 21% vs 51%, P < 0.001) and a higher risk of relapse (5-year RFS: 21% vs 44%, P < 0.001 and 5-year CIR: 60% vs 46%, P = 0.003) in the validation cohort (Table 3, Fig. 3b). Multivariable logistic and Cox regression analyses were used to examine the effect of HMGA2 expression adjusted for these known prognostic variables: age, log WBC count, secondary disease, WHO/ECOG performance status, presence of adverse cytogenetics, FLT3-ITD and NPM1 mutations. These results confirmed that H + was significantly and independently associated with lower CR/CRi (CR with incomplete hematologic recovery) frequency (aOR = 3.98, (95% CI, 1.36-11.65), P = 0.010), worse OS (aHR = 2.03, (95% CI, 1.36-3.03), P < 0.001), and RFS (aHR = 2.06, (95% CI, 1.38-3.08), P < 0.001) and a higher CIR (aHR = 2.01 (95% CI, 1.28-3.14), P = 0.002) ( Table 3). The utility of the HMGA2 test was also evaluated in AML patients classified using a clinical risk score to identify high-risk patients. High-risk disease was defined according to the NCRI multi-parameter risk score, based upon baseline characteristics and response to the first course of induction chemotherapy 30,31 (detailed in Supplementary  Information, Statistical Methods section). Importantly, among the 157 patients not classified in the NCRI high-risk category, 52 (33%) H + patients had a significantly worse survival than 105 H− patients (P = 0.002) (Fig. 3d).

Discussion
HMGA2 encodes a member of the HMGA family of proteins implicated in chromatin remodeling and transcription regulation. It is overexpressed in many human solid tumors and its upregulation was thought to be potentially associated with tumor progression and poor prognosis. 32,33 This study reports the strong negative prognostic impact of HMGA2 overexpression in AML, thus justifying the development and validation of a rapid, simple and inexpensive RT-qPCR test, also optimized on the droplet digital PCR platform, which can now be implemented in clinical laboratories. Our findings reveal that high HMGA2 expression confers a significantly higher probability of primary refractory disease after an anthracycline and cytarabine based induction chemotherapy. Interestingly, in the training cohort, the HMGA2 test also reclassified 17.7% of intermediate cytogenetic risk patients into a poor risk group. These results were confirmed in the validation cohort in which 33% of patients not classified in the NCRI high-risk category were H + and had a significantly worse survival  Fig. 4 HMGA2 is an independent prognostic factor of poor outcome in AML. Forest plot for multivariable analyses of overall survival, relapse-free survival and cumulative incidence of relapse in the training cohort. aHR, adjusted hazard ratio; CI, confidence intervals; HMGA2 + , high expression ( ≥ 1100 NCN); HMGA2-, low expression (<1100 NCN); HSCT, allogeneic hematopoietic stem cell transplantation; ITD, internal tandem duplication; WBC, white blood cell counts (×10 9 /l). As the non-linear effect of age at diagnosis is represented jointly by the two coefficients (linear and quadratic), the interpretation of each coefficient separately is not appropriate and not shown in the figure   than H − patients. This new knowledge could guide clinicians to consider offering more intensive or novel consolidation therapies for these patients. Data presented in this study also highlight the possibility that HMGA2 expression status may predict outcome following allo-HSCT, although our study does not have the power to fully address this issue.
Importantly, in a subgroup of ELN adverse genetic risk patients, a positive HMGA2 test could also predict resistance to standard treatments including allogeneic stem cell transplantation. However, these results require further validation in other AML cohorts with comprehensive mutation profiling data and classified according to the 2017 ELN genetic risk categories. Future prospective studies will determine if specific therapeutic strategies such as investigational new drugs or novel transplantation methods can improve the clinical outcome of HMGA2 positive patients.
Although age, mutations, and cytogenetic characteristics affect patient survival in AML, we demonstrate that expression of a single gene, HMGA2, is an independent prognostic factor in multivariable analyses in two independent AML cohorts. Moreover, HMGA2 appears to integrate the negative prognostic value conferred by complex karyotype and several poor risk mutations and could simplify prognostic assessment of positive cases. However, the test did not capture all poor prognosis patient subgroups. For example, MLL rearrangements and the poor prognostic NPM1 + FLT3-ITD + DNMT3A + subset 3 (~7% and~12.5% in the Leucegene cohort, respectively) were frequently associated with low expression levels of HMGA2. Based on these findings, we propose a new algorithm integrating the HMGA2 test in current strategies for AML prognostic assessment (Supplementary Figure S10). Validation of this algorithm in clinical trials is warranted.
In conclusion, this study showed that high HMGA2 expression adds significant independent prognostic value to known clinical and genetic prognostic factors in AML, and is predictive of poor clinical outcomes with standard AML therapies. The HMGA2 test could complement the current AML tests to improve treatment orientation and be integrated in ongoing and future prospective clinical trials studying innovative therapies to increase survival of HMGA2 positive AML patients.