The Impact of MRI Features and Observer Confidence on the Treatment Decision-Making for Patients with Untreated Glioma

In a blind, dual-center, multi-observer setting, we here identify the pre-treatment radiologic features by Magnetic Resonance Imaging (MRI) associated with subsequent treatment options in patients with glioma. Study included 220 previously untreated adult patients from two institutions (94 + 126 patients) with a histopathologically confirmed diagnosis of glioma after surgery. Using a blind, cross-institutional and randomized setup, four expert neuroradiologists recorded radiologic features, suggested glioma grade and corresponding confidence. The radiologic features were scored using the Visually AcceSAble Rembrandt Images (VASARI) standard. Results were retrospectively compared to patient treatment outcomes. Our findings show that patients receiving a biopsy or a subtotal resection were more likely to have a tumor with pathological MRI-signal (by T2-weighted Fluid-Attenuated Inversion Recovery) crossing the midline (Hazard Ratio; HR = 1.30 [1.21–1.87], P < 0.001), and those receiving a biopsy sampling more often had multifocal lesions (HR = 1.30 [1.16–1.64], P < 0.001). For low-grade gliomas (N = 50), low observer confidence in the radiographic readings was associated with less chance of a total resection (P = 0.002) and correlated with the use of a more comprehensive adjuvant treatment protocol (Spearman = 0.48, P < 0.001). This study may serve as a guide to the treating physician by identifying the key radiologic determinants most likely to influence the treatment decision-making process.

patients and treatment options. We retrospectively included 220 previously untreated adult patients from two institutions on different continents. Of these, 94 patients came from institution A (49 males, median age 49 years, range 23-79 years), and 126 patients came from institution B (69 males, median age 56 years, range 21-87 years). Institution A is a national referral site for brain tumor patients and accounts for almost 50% of all patients in the country. All patients were referred to a diagnostic, contrast-enhanced MR exam between 2003 and 2012, before surgery and subsequent histopathological diagnosis of a WHO grade II-IV glioma (Fig. 1). Patient demographics, histopathological assessments, Karnofsky Performance Status (KPS) at the time of MRI are summarized in Table 1. Per study protocol, treatment options were reviewed up until the end of August 2013 using patient medical records and hospital registry systems. The treatment data included ( Table 2); steroid use at the time of MRI, type of surgery (biopsy, subtotal resection <90% or gross total resection >90%), type of adjuvant therapy (fractionized radiotherapy, chemotherapy, combined chemoradiation and/or anti-angiogenic therapy with bevacizumab) 11 , and the number of reoperations (no/one/multiple) within the study period. Post-surgery treatment was decided using all information available to the treating physicians at the time, including MRIs, clinical information and histopathological diagnosis.
For institution B, the imaging protocol 13 mirrored that of institution A, including concurrent diffusion MRI in 119 of 126 patients. All patients had DSC-MRI with either a single or a double dose of 0.1 mmol/kg gadopentate dimeglumine (Magnevist ® , Bayer Schering Pharma, Berlin, Germany) at a rate of 5 mL/sec followed by 20 ml saline (sequence details shown in Table 1).
image pre-processing. All image pre-processing was performed using nordicICE (NordicNeuroLab AS, Bergen, Norway) or the open source platform 3D Slicer. As previously described 14 , a board-certified neuroradiologist at each institution identified tumor regions on conventional MRIs using manual outlining in nordicICE for institution A, and a semi-automatic approach in 3D Slicer 15 for institution B. Using nordicICE, apparent diffusion coefficient (ADC) maps were estimated using Stejskal-Tanner diffusion approximation 16 and DSC-MRI data were automatically processed to create relative cerebral blood volume (rCBV) maps using standard kinetic modelling and corrected for contrast agent extravasation 17 . All rCBV maps were normalized to normal-appearing tissue 18 and presented as semi-transparent color overlays on anatomical MRIs. observers. Two neuroradiologists with >10 and >15 years of clinical experience with brain MRI, were included from institution A. Two additional neuroradiologists were included to represent institution B, one with >10 years of clinical experience with brain MRI and working at institution B. The other observer representing institution B was a consultant neuroradiologist from a third institution with >5 years of clinical experience with brain MRI. The observers from institution A reviewed the MRI data of institution B, and vice versa.
first MRi reading. Using anatomical and diffusion MRI only, the observers recorded their scores from visual grading using the Visually AcceSAble Rembrandt Images (VASARI) standardized feature set 6 through the joint National Institute of Health (NIH) and National Cancer Institute (NCI) initiated Annotation and Image Markup (AIM) scoring template for gliomas. The VASARI scoring system includes 19 semantic descriptors of imaging features of brain tumors (Supplementary Table 1). Tumor location was also assessed by inclusion (Yes/No) of the parietal, frontal, occipital and temporal lobes, cerebellum, the basal ganglia, thalamus, insula, as well as deep white matter involvement including corpus callosum and internal capsule.
The observers then classified the gliomas per suggested WHO grade, and the corresponding level of confidence using a four-level classification scheme; (I) doubtful (<50% certainty), (II) somewhat confident (50-70% certainty), (III) very confident (70-90% certainty), and (IV) extremely confident (>90% certainty). To mimic a real-world clinical situation, patient age and the presenting neurologic symptoms as written on the admission recording were available to the observer, while all other information was blind.
Second reading. After an interval of at least one month, the observers were again presented with the MRI data to perform a second, repeated reading, with a re-shuffled patient sequence. The two readings where averaged to compensate for intra-observer variability. Immediately after the second reading, the observers were also presented with DSC-MRI data. The observers then re-examined all available imaging data using rCBV color maps at will, and recorded a third set of glioma grades and confidence levels.
Statistical analyses. Any institutional differences in patient demographics and MRI findings were assessed using independent samples two-sided t-tests. Treatment endpoints were tested for associations by stepwise linear regression models (ANOVA), or rank tests (Mann-Whitney or Kruskal-Wallis) and Spearman correlation if parametric assumptions were not met. Model inputs included patient gender, age at time of MRI (years), KPS (%), total tumor volume and edema volume (cubic centimeter), and treatment options. The stepwise linear model was halted at the first value not passing significance.
Intra-observer reproducibility was assessed by glioma grade and observer confidence using intra-class correlation coefficients (ICC) between the first and second readings of the conventional MRIs only (adding rCBV refutes the test-retest scheme). Moreover, to overcome dependencies to observer-and institutional variations, the data were also pooled into a single, 220 patient cohort. A patient was labeled according to the average score between the first-and second observer readings, and thereafter across two observers at the same institution. For missing data by one or more observers, the recorded value of the remaining observer was used.
Statistical analyses were performed using SPSS 22 (SPSS Inc., USA). A P-value of 0.05 was considered significant, and all tests with multiple comparisons were Bonferroni corrected (P = 0.05/number of tests).

Results
institutional differences in patient data and treatment options. The ratio of low-grade to high-grade gliomas was significantly higher for institution A compared to institution B (49% versus 18%, P = 0.003). However, the distributions of WHO grades within the institution was similar ( www.nature.com/scientificreports www.nature.com/scientificreports/ not passing Bonferroni). The average KPS was correspondingly higher at institution A (90.53% ± 11.20% versus 86.59% ± 9.48%, P = 0.005), and steroids during initial MRI were administered more frequent (53% versus 17%, P < 0.001). The number of gross-total resections in WHO grade II gliomas was lower for institution A compared to institution B (Table 2; 13% versus 58%, P = 0.018). Patients at institution A were also more likely to have repeated surgery (all patients: 26% versus 9%, P = 0.002), owing to higher repeated surgery of WHO grade II gliomas (42% versus 0%, P < 0.001). Chemotherapy as a monotherapy was only administered at Institution A, while anti-angiogenic drugs were only administered at institution B. All patients on anti-angiogenic therapy also received combined chemo-radiation. observer outcome measures and intra-observer repeatability. An overview of the average WHO grade and corresponding confidence levels of all observers are shown in Table 3. At both institutions and for all observers, the proposed WHO grades did not change over the course of the study readings. In contrast, the confidence scores increased significantly for all observers with the addition of the rCBV map (at the P < 0.001 level, all observers, Table 3). Moreover, for institution A, the ICCs when grading patients by WHO grades II-IV were 0.93 (P < 0.001, n = 91) and 0.88 (P < 0.001, n = 73) for the two observers, respectively. The ICCs of the corresponding confidence scores were lower at 0.7381 (P < 0.001, n = 91) and 0.58 (P < 0.001, n = 73), respectively. For institution B, the ICCs when grading patients by WHO grades II-IV were 0.69 (P < 0.001, n = 120) and 0.83 (P < 0.001, n = 124) for observer 1 and 2, respectively. Again, the ICCs of the corresponding confidence scores were lower at 0.35 (P < 0.001, n = 120) and 0.25 (P = 0.002, n = 124) for observer 1 and 2, respectively.
Institutional differences in MRI features. The average size of the tumors was smaller (37.63 ± 33.34 mL versus 68.42 ± 58.87 mL, P < 0.001), while the average peritumoral edema region was larger (53.43 ± 47.01 mL versus 34.38 ± 50.90 mL, P = 0.012) at institution A compared to institution B. Matched for WHO glioma grade, ependymal extension was the only imaging features from the first MRI reading separating patients of institution A from institution B (67% versus 17%, P < 0.001). For WHO grades II and IV only, the non-enhancing tumor margins were less well-defined at institution A compared to institution B (30% versus 73%, P < 0.003). There was no significant difference in the enhancement quality between 1.5 and 3 Tesla systems (Table 1) (P > 0.79; both observers), nor between a single -versus double-dose contrast agent administration (P > 0.79). Table 4 highlights the pan-institutional imaging features associated with the choice of surgical procedure. In short, by VASARI, patients with a biopsy or a subtotal resection were more likely to have a tumor with pathologic FLAIR/T2 signal crossing the midline, and those receiving a biopsy sampling more often had multifocal or multicentric lesions. Patients with low post-contrast enhancement quality (0-1) had longer time between the pre-surgical MRI and subsequent surgery (trimmed mean = 10.84days versus 3.92days, P < 0.01, Hazard Ratio [95% conf.int]; HR = 1.21 [1.06-1.38]). For WHO grade IV glioblastomas only (N = 121), lack of FLAIR/T2 signal crossing the midline and no satellite enhancement foci were associated with a gross total resection (P < 0.001, HR = 1.37 [1.17-1.61]).

Associations between MRi features and subsequent neurosurgery.
For Institution A, patients with a biopsy sampling or a subtotal resection more often had tumors with an infiltrative T1/FLAIR ratio (1.94 ± 0.71 and 1.91 ± 0.68 versus 1.41 ± 0.54, P = 0.008) and less pial invasion (16% and 41% versus 53%, P = 0.002) compared to those with a total resection. Findings of institution B mirrored the pan-institution analyses (Table 4).

Associations between MRi features and adjuvant therapy.
Comparing patients on combined chemo-radiation with-or without additional anti-angiogenic therapy (institution B), patients receiving bevacizumab (N = 29) were more likely to have a tumor in the deep-brain by more frequent involvement of the thalamus (28% versus 11%, P = 0.002) and less frequent involvement of the temporal lobes (31% versus 56%, P = 0.002). Figure 2 shows representative MRI of two patients with low-and high observer confidence, respectively. Steroid use at the time of MRI (N = 72) was associated with higher observer confidence when evaluating anatomical MRI only (2.79 ± 0.75 versus 2.43 ± 0.64, P < 0.001). Interestingly, when adding DSC-MRI, the confidence level for the non-steroid group (N = 148) increased significantly (from 2.43 ± 0.64 to 3.03 ± 0.80, paired t-test: P < 0.001), and to a higher level than the steroid group (3.03 ± 0.80 versus 2.79 ± 0.75, P < 0.001). Moreover, adding DSC-MRI reduced the number of VASARI features associated with low confidence (score 1-2) at the P < 0.001 level (median 3 versus 0 features; Supplementary Table 1). For WHO glioma grades II and III (N = 99), observer confidence was associated with the choice of surgical procedure. The lowest observer confidence was found in patients receiving a biopsy only (2.02 ± 0.44) compared to subtotal resection (2.25 ± 0.50) and gross total resection (2.40 ± 0.44, P = 0.002).

Associations between observer confidence and treatment.
Finally, for low-grade gliomas (N = 50), the use of adjuvant therapy was inversely correlated with observer confidence (Spearman = −0.48, P < 0.001). Patients not receiving any adjuvant therapy had the highest observer confidence (median 2.50, range 2.00-3.25, N = 22), whereas patients on combined chemo-radiation with-or without anti-angiogenic therapy had the lowest observer confidence (median 2.00, range 1.75-3.5, N = 13).  Table 4. Associations between MRI features and subsequent neurosurgery (both institutions). Note. ***Significant features at the P < 0.001 level (Bonferroni corrected). **Significant features at the P < 0.01 level (Bonferroni corrected). # Observer scorings deemed indeterminate were excluded from analysis. Highest incidence/value highlighted in bold.  Table 5. Associations between MRI features and adjuvant therapy (both institutions). Note. ***Significant features at the P < 0.001 level (Bonferroni corrected). **Significant features at the P < 0.01 level (Bonferroni corrected). # Observer scorings deemed indeterminate were excluded from analysis. Highest incidence/value highlighted in bold.

Discussion
In a blind and retrospective, dual-center, multi-observer study, we quantify pre-treatment radiologic features by MRI that are systematically associated with the outcome of subsequent treatment of adult patients with gliomas and therefore likely to play a key role in the clinical decision-making process. The choice of surgical intervention was associated with the complexity of tumor infiltration, and patients whose tumors had indiscernible contrast-enhancement patterns received a more conservative management by approximately 3 times longer time between the MRI exam and subsequent surgery. Tumors with a pathologic MRI-signal crossing the midline and/ or multifocal disease were more likely to have sub-total resection or especially a biopsy. This finding was also associated with lower observer confidence. Tumor progression and the need for repeated surgery where associated with pial invasion and a more edematous signature of the peritumoral region [19][20][21] . Moreover, steroids were more often seen in invasive and infiltrative tumors with poorly defined non-enhancing margins. Still, use of steroids was also linked to higher observer confidence, where the tumors' overall appearance probably showed less ambiguous imaging features. For adjuvant therapy and corrected for grade, patients not receiving any additional therapy outside surgery had the highest observer confidence. Observer confidence also returned lower ICCs than those from glioma grading. Interestingly, this suggests that treating physicians are more likely to opt for additional adjuvant treatment options for tumors in which the imaging appearance is less typical, irrespective of histological grade. This fact is not entirely unexpected and likely reflects physicians understanding about the inherent limitations of the WHO grading system as the main or sole determinant of treatment decisions, a historical dogma which is being challenged and replaced by recent discoveries in glioma oncogenetics 22,23 . This finding may also warrant the need for standardized disease registries in order to learn from the decisions made and the subsequent outcomes of previous decision-making processes. Patients receiving combined chemo-radiation had well-defined non-enhancing tumor margins and lack of deep-brain involvement. Because the choice of treatment is arguably linked to glioma location 12,19,24 , these imaging features probably helped identify the target area for radiotherapy. Also, the lack of deep-brain involvement is consistent with the goal of minimizing radiation damage to basic functions of the brain. Instead, patients with tumors of the deep-brain where more likely to receive anti-angiogenic therapy.
Our study advocates the need for a high-quality, focused MRI protocol to complement the clinical and histopathologic data for pre-treatment assessment of patients with brain tumors. Novel imaging techniques are regularly introduced into oncologic research, with the ability to visualize new aspects of tumor pathophysiology, cellularity, metabolic profile and hemodynamic status 4,6,9,25,26 . Glioma imaging protocols are therefore becoming increasingly comprehensive, time-consuming and costly, whilst quantification of any added impact on the decision-making process is still rarely performed. It can therefore be debated under what circumstances the diagnostic process is really improved in a cost-effective way by increasing the number of exams 27 . In line with previous reports, adding DSC-MRI to a conventional imaging protocol improved the observers' confidence of the glioma characterization in untreated (non-steroid) patients 10 . Also, with DSC-MRI, observers suggesting a lower glioma grade was associated with adjuvant radiation-or chemo-monotherapy, whereas the suggestion of a higher glioma grade was associated with combined chemo-radiation. The reduced number of VASARI features associated with a low observer confidence could potentially be seen as a time-saving feature of DSC-MRI. While comparing subjective confidence scores across that observers with various levels of experience should be performed with care, our results indicate DSC-MRI may aid less experienced readers. For prospective studies, introducing machine learning alternatives may help confirm or identify other relevant imaging feature of the disease, and also reduce the inherent observer variations that follow complex diagnostic readings 13,28 . By comparing the results of an artificial intelligence (AI) model to that of our current expert radiologic examination, we can reveal the added value of the AI model for assessment of disease. Finally, use of AI-based model interpretability may help generate more powerful radiomics signatures from the hidden layers of the neural network beyond the radiologist-labeled, classical VASARI features 29 . our study has some limitations. Owing to inherent regional and national determinants, differences in patient demographics between the two institutions may have influenced our results. However, this difference is also welcomed in what makes a multi-center study stand out from a single-institution analogue, and introduce a compelling range in our findings beyond a certain demographic setting. Moreover, while taking measures to blind the observers, a study design of this nature will never truly mimic the dynamic and complex workup of oncologic practice. Undoubtedly, the treating physician will also include information from the histopathologic analyses in the treatment decision-making process. Therefore, our findings also include imaging analyses from patients of the same WHO type and grade. Also, owing to the retrospective nature of our study, the WHO grading system of 2007 was used in histopathological diagnosis 3 , and neither we, nor the treating physicians at the time, had access to the molecular profiles of individual tumors. Both mutation status of isocitrate dehydrogenase (IDH) 1/2 and methylated O6-methylguanine DNA methyltransferase (MGMT) may influence the treatment decision process, and also potentially be determined by DSC-MRI 26,30 . Furthermore, and unlike today, not all of the anatomical MRI data at the time used a 3D image readout. However, this should only to a limited extent affected the VASARI criteria as presented in our study.
To conclude, in a comprehensive study we identify the key radiographic determinants of glioma patients associated with the treatment decision-making process. The choice of surgical intervention was associated with the complexity of tumor infiltration and low observer confidence was associated with a more extensive adjuvant treatment protocol.