A validated composite organ and hematologic response model for early assessment of treatment outcomes in light chain amyloidosis.

Newly diagnosed AL amyloidosis patients were evaluated to develop a model for early assessment of treatment benefit at 6 months, integrating both hematologic (HR) and organ response (OR) assessment (testing cohort, Mayo: n = 473; validation cohort, Pavia: n = 575). Multiple OR were assessed as follows: All OR (AOR): response in all organs, mixed OR (MOR): response in some organs, no OR (NOR)]. AOR rates at 6 months improved with deepening HR; complete response (CR; 38%, 35%), very good partial response (VGPR; 30%, 26%), and partial response (PR; 16%, 21%), respectively. A composite HR/OR (CHOR) model was developed using incremental scoring based on hazard ratios with scores of 0–3 for HR (0—CR, 1—VGPR, 2—PR, 3—no response) and 0–2 for OR (0—AOR, 1—MOR, 2—NOR). Patients could be divided into two distinct CHOR groups (scores 0–3 and 4–5), with median OS in group 1 and group 2: Not reached vs. 34 months, p < 0.001 [Mayo] and 87 vs. 23 months, p < 0.001 [Pavia]. In conclusion, we developed a model that can assess multiple organs concurrently, and integrate both HR and OR assessments to determine early clinical benefit with treatment, which may be used as a surrogate end-point in trials and to compare outcomes with different therapies.


Introduction
Deposition of misfolded light chains secreted by the plasma cell clone leads to organ dysfunction in patients with light chain (AL) amyloidosis [1][2][3] . The most commonly affected organs include the heart, kidney, and liver; and many patients have more than one organ involvement [3][4][5] . Prognosis depends both on the severity of organ involvement, especially the heart, and the underlying plasma cell burden [6][7][8] . Treatment is targeted toward the plasma cell clone 2,[9][10][11][12][13] . In most patients, the organ dysfunction is the main driver of morbidity and mortality and the plasma cell burden is usually low 6,8,14,15 . Given this, it would be ideal to assess treatment efficacy by its impact on organ improvement. However, time to organ response (OR) can be varied and is usually delayed 5 . Therefore, treatment efficacy, especially early-on is typically determined by hematologic response (HR) 16 .
Deep HR increases the likelihood of OR and long-term survival, but this is not always the case and there is interpatient variability in the relationship between depths of HR and OR 5,17 . No model currently exists to integrate the two assessments for clinical use. This makes early assessment of treatment benefit difficult in this disease, preventing relatively rapid evaluation of clinical trial results and precludes design of clinical trials for timely intervention in patients with likely poor outcome with ongoing therapies. Early identification of patients who are not likely to benefit from a given therapy is increasingly important as these patients have an inferior survival and more treatment options are becoming available for these patients 3 . A composite model that takes into account both HR and OR at a given time point may allow for early assessment and thus become a useful surrogate endpoint for clinical trials. It may also identify patients who have not achieved a deep HR, but can safely continue first line therapy if they have achieved an OR. Such a surrogate model may also be helpful in light of new, emerging therapies that target the amyloid fibril and can potentially lead to earlier OR 18,19 .
In this study, we have developed and validated a composite model to integrate OR and HR in AL amyloidosis to define a surrogate end point for use in treatment trials in AL amyloidosis.

Study population
Patients with biopsy proven newly diagnosed AL amyloidosis with involvement of heart, liver, or kidney who received treatment were included. Amyloid deposits were confirmed as AL type by electron microscopy immunohistochemistry 20 or mass spectrometry 21

Response assessment
In evaluable patients, HR was assessed using validated criteria 16 . Patients who had difference in involved and uninvolved free light chains (dFLC) <5 mg/dL were assessed for response by criteria of complete response (CR) only, based on recent data [22][23][24] . Organ involvement and response were assessed by using existing criteria as described in supplementary data 16,[25][26][27] . OR was classified as all organ response (AOR): response in all of the involved and evaluable organs (heart, kidney, liver); mixed organ response (MOR): response in at least one of the organs and no organ response (NOR). Patients were assessed for response at the 6 months (±2 months) and 12 months (±2 months) time-point in the Mayo cohort and at 6 months time-point (±2 months) for the Pavia group.

Combined hematologic and OR (CHOR) model
A model for CHOR was developed using the Mayo Clinic test cohort (Fig. 1). Patients were assigned scores of 0-3 for HR as follows: 0-CR, 1-very good partial response (VGPR), 2-partial response (PR), 3-no response (NR) or progression. Patients who had dFLC <5 mg/dL were assigned a score of 0 for CR and 1 for other response as OS for the latter group was most similar to achieving VGPR. OR was scored as follows: 0-AOR, 1 -MOR, and 2-NOR. Hazard ratios (HR) for OS were calculated for scores 1-5 relative to a score 0 (complete OR and HR) to construct groups based on similar hazard ratios. Patients were then divided into two groups: CHOR group 1 (scores 0-3) and CHOR group 2 (scores 4-5).

Analysis
Statistical analysis was carried out using the JMP (version 12, SAS Institute Inc., Cary, NC) and Stata (version 13.1, StataCorp, College Station, TX) software for the Mayo Clinic cohort and using MedCalc Statistical Software version 18.1 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org) for the Pavia cohort. Chi-Square and Fischer Exact tests were used to carry out univariate analysis for categorical variables and Wilcoxon Rank Sum/Kruskal-Wallis for continuous variables. Survival analysis was carried out using the Kaplan-Meier method and the log-rank test was used to compare survival curves. Cox proportional hazards model was used to evaluate hazard ratios for survival. 95% confidence intervals (CI) are reported. OS was defined as the time from start of treatment to death. Cox regression was used to compare the predictive power of HR, OR, and the composite CHOR model 28 . Goodness of fit of nested models was evaluated using likelihood ratio tests. Predictive power of the individual and composite models were compared using Harrell's C 29 . All hypothesis tests were two-sided, pvalues below 0.05 were considered statistically significant.

Baseline characteristics
The test cohort consisted of 473 patients from Mayo Clinic cohort, who were alive at the 6-month time point and had HR and OR data available. The validation cohort consisted of 575 patients from Pavia with response data available at the 6-month time-point. Baseline characteristics of patients in the two cohorts are summarized in Table 1. The median age at diagnosis was 63-64 years in both series, and males comprised 65% and 58% of the population in the Mayo and Pavia cohorts, respectively. Amyloidogenic light chain was lambda in 78% of patients in both cohorts. Median dFLC and bone marrow plasma cells at diagnosis were: 19 mg/dL and 10% (Mayo) and 19 mg/dL and 11% (Pavia), respectively. Proportion of patients with dFLC < 5 mg/dL (or 50 mg/L) at diagnosis was 13% in the Mayo cohort and 7% in the Pavia cohort. Presence of t (11,14) on interphase fluorescence in-situ hybridization (iFISH) was noted in 53% of patients and presence of trisomy/tetrasomy in 21% of patients in the Mayo cohort. iFISH data of the Pavia cohort were not available. The most common organs involved were the heart (70% and 79%) and kidney (70% and 69%). Liver was involved in 14% and 11% of patients, respectively. Number of major organs (heart, liver, and kidney) involved in patients from the Mayo cohort were as follows, one: 54%, two: 37%, and three: 8%. In the Pavia cohort, distribution of organ involvement was as follows: one: 47%, two: 46%, and three: 7%.
Combined OR Combined OR in the Mayo Clinic cohort at 6 months was as follows: AOR 26% (125/473), MOR: 14% (n = 66/473), NOR: 60% (n = 282/473). OR rates improved at 12 months; AOR: 45% (n = 194/435), MOR: 12% (n = 54/435) and NOR: 43% (n = 187/435). Combined OR rates at 6 months in the Pavia cohort were as follows: AOR 21% (n = 120/575), MOR: 18% (n = 105/ 575), NOR: 61% (n = 350/575). Combined OR rates increased with deeper HR. In the Mayo cohort, OR rates at 6 months for patients achieving hematologic CR In the Mayo Clinic group, there were nine patients who either had no HR or hematologic progression at 6 months, who had concurrent OR. There were three patients with both kidney and renal involvement; two of them had only renal response and one had both heart and renal response. The remaining six patients had renal involvement alone and had a renal response. In the Pavia cohort at 6 months there were seven patients who obtained a cardiac response and 24 patients who had a renal response, while no HR was achieved. However, in all cases a reduction of dFLC between 40% and 50% was achieved. Table 2 describe the survival outcomes in the Mayo cohort based on the combined OR parameter for the 6-month response (Fig. 2a, b). Survival based on combined OR was analyzed for all patients (Fig. 2a) and subsets of patients with more than one organ involved (Fig. 2b). Patients who achieved AOR at 6 months had the best outcomes with median OS in AOR vs. MOR vs. NOR groups being:, not reached vs. 81 vs. 85 months, p < 0.001. In patients with more than one organ involved, where patients could have mixed or discordant ORs, median OS in the three groups (AOR vs. MOR vs. NOR) was not reached. vs. 81 vs. 52 months. These parameters were evaluated in the Pavia cohort based on 6-month OR and were predictive of OS (Table 2, Fig. 3a, b). Patients with AOR had the best survival, followed by MOR and NOR (Fig. 3a). Subset analyses in patients with more than one organ involved showed similar results (Fig. 3b and Supplementary Fig. 2). While the absolute survival outcomes were different in the Mayo Clinic and Pavia cohort, the survival trend and magnitude of difference was similar across the two groups. In the subset of patients with heart involvement, OS based on AOR vs. MOR vs. NOR was as follows in the Mayo Cohort: not reached vs. 81 months vs. 63 months, p < 0.001 and similar in the Pavia cohort ( Table 2 and Supplementary Fig. 3).

Figure 2 and
In patients with involvement of both heart and kidney and who achieved cardiac response by 6 months, status of renal response did not impact survival further as shown in Supplementary data. Amongst patients with renal involvement, achievement of renal response by 6 months was associated with significantly better dialysis-free survival, with 88% vs. 65% of patients remaining dialysis free at 5 years, p < 0.001 (Mayo cohort)

CHOR model
A composite score (CHOR) was developed based on HR and OR as described in the "Methods" section ( Fig. 1). Patients with a score of zero were those who achieved a hematologic CR as well as well as response in all organs. Patients were divided into two groups based on the HR for survival (Mayo cohort). The groups were as follows: group 1: scores of 0-3 (N = 349), group 2: scores of 4-5 (N = 124). As illustrated in Fig. 4a, b, patients in CHOR group 1 had significantly better survival outcome compared to group 2 (median OS: not reached vs. 34 months, p < 0.001) with HR of 3.4 (2.5-4.6), p < 0.001. This model was then validated in the Pavia cohort and median OS for patients in CHOR group 1 vs. 2 was 87 vs. 23 months, p < 0.001 with HR of 2.8 (2.2-3.5), p < 0.001. This model was  Figs. 4 and 5).
We compared the CHOR model (group 1 vs. 2) at the 6month time-point to (1) the HR criteria (achieving CR vs. not) and (2) achieving AOR vs. not using Cox regression, with Mayo cohort as the training cohort and Pavia cohort as the validation cohort. The CHOR model had significantly higher predictive power (C = 0.59) compared to the HR model (C = 0.56; with absolute difference in Harrell's C of 0.03, 95% CI, 0.01-0.06; p = 0.006) as well as when compared to the OR model (C = 0.56, with

Discussion
At present, there is no available model in AL amyloidosis that allows for concurrent analysis of hematologic and organ responses (especially responses in multiple organs) in a group of patients. Our retrospective study assessed response with treatment in two independent cohorts of patients with AL amyloidosis to develop and validate a model integrating simultaneous assessment of both HR and OR. Importantly, the model was able to predict OS in both cohorts, with greater predictive power compared with HR or OR assessed in isolation. This model can be used as a surrogate endpoint for rapid assessment of clinical trials. This would allow for shorter duration of follow-up and enable faster completion of these studies. This model can be incorporated in studies designed to make early treatment changes based on response. It can also be easily integrated into clinical practice for prognostication and integrating data across different therapeutics for clinical decision making.
The testing and validation cohorts were large independent cohorts from amyloidosis referral centers with longterm follow-up data. The majority of patients in both cohorts had cardiac involvement and about one-half had involvement of more than one major organ. Overall treatment patterns observed in our cohorts are similar to other cohorts reported over this time period 27,30 . Rates of transplant were strikingly different in the two cohorts (Mayo: 41%, Pavia: 1%), which suggests that the results and the CHOR model are generalizable to patients managed with different treatment approaches.
As there is no current method to evaluate multiple organ responses simultaneously, we first developed a combined parameter to assess OR. In both cohorts, patients who achieved response in all organs (AOR) had significantly better OS than those achieving response in some (MOR) or none (NOR) of the involved organs. When comparing MOR and NOR subgroups, there was no difference in OS in the overall Mayo Clinic cohort. However, there was clearly a significant difference MOR vs. NOR groups when evaluating patients with more than one organ involvement, which is the group where mixed or discordant organ responses are possible. This OR endpoint was then combined with HR in a simple, easy to use CHOR model which scored patients from low to high if they achieved response vs. not. This scoring was derived from HR for survival from Cox proportional hazards analysis. Patients could be categorized into two distinct groups with different survival outcomes based on OR and HR assessment at the 6-month landmark in both cohorts. This model was able to distinguish between patients at the 12-month landmark as well. Moreover, this composite model had better predictive power for OS than either HR or OR in isolation in both the test and validation cohorts. The absolute survival outcomes in various groups were different in the Mayo Clinic and Pavia cohort. These differences are likely attributable to several factors including the differences in treatment, specifically the rates of stem cell transplant, which were strikingly different (41% vs. 1%) and possibly the responses in individual organs, particularly cardiac response, which is the major driver of survival in AL amyloidosis. Further, while the absolute survival outcomes differed in the two cohorts, the general magnitude of difference was similar.
In subset analysis of patients with cardiac involvement, the composite model had better predictive value compared with HR, but not cardiac response. This may be due to lack of adequate power or alternatively, cardiac response may be the main driving factor impacting survival. On the other hand, in the subset of patients with renal involvement, the CHOR model performed better compared to achieving renal response in predicting patient survival, but not when compared to HR. This may again be due to lack of adequate power or the fact that achieving renal response does not impact OS 27 . However, renal response remains important as it is a strong predictor for renal failure requiring dialysis as shown in our study and prior reports 27 . As over two-thirds of patients with AL amyloidosis can have involvement of more than one organ 4 , the combined CHOR model, with superior predictive value would be applicable to all patients with AL amyloidosis.
Overall, patients who achieve both OR and HR early in disease course have superior outcomes. This finding is reassuring and the development of a model which can be used systematically to assess both responses is a novel contribution of our study. The current model is able to integrate the relative improvements in hematological and organ parameters to provide a unified readout that can be used in clinical trials, as well as in clinical practice for potentially altering treatment approaches. Our study has limitations given its retrospective design, and heterogeneous nature of treatment received by patients. Moreover, the survival outcomes of patients in the two cohorts are different, likely related to baseline risk and differences in therapies used. However, development of a model in a real world scenario with an independent validation cohort results in wider applicability of the model. Previous cohorts of patients that have been used for development of HR and OR criteria for amyloidosis have also been treated in a heterogeneous manner 16,27 .
In conclusion, we have developed a model in AL amyloidosis to assess multiple organs concurrently, as well as integrate both HR and OR assessments to determine early clinical benefit with treatment, supporting its use as a surrogate end-point in clinical trials and compare outcomes with different therapeutic approaches. Future studies incorporating this endpoint should be designed to evaluate the utility of changing treatment in patients not achieving this endpoint.
Author details