Clinical Evaluation of a Fully-automatic Segmentation Method for Longitudinal Brain Tumor Volumetry

Meier, Raphael; Knecht, Urspeter; Loosli, Tina; Bauer, Stefan; Slotboom, Johannes; Wiest, Roland; Reyes, Mauricio

doi:10.1038/srep23376

Download PDF

Article
Open access
Published: 22 March 2016

Clinical Evaluation of a Fully-automatic Segmentation Method for Longitudinal Brain Tumor Volumetry

Raphael Meier¹^na1^na2,
Urspeter Knecht²^na1^na2,
Tina Loosli²^na1^na2,
Stefan Bauer^1,2^na1^na2,
Johannes Slotboom²^na1^na2,
Roland Wiest²^na1^na2 &
…
Mauricio Reyes¹^na1^na2

Scientific Reports volume 6, Article number: 23376 (2016) Cite this article

5674 Accesses
72 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Information about the size of a tumor and its temporal evolution is needed for diagnosis as well as treatment of brain tumor patients. The aim of the study was to investigate the potential of a fully-automatic segmentation method, called BraTumIA, for longitudinal brain tumor volumetry by comparing the automatically estimated volumes with ground truth data acquired via manual segmentation. Longitudinal Magnetic Resonance (MR) Imaging data of 14 patients with newly diagnosed glioblastoma encompassing 64 MR acquisitions, ranging from preoperative up to 12 month follow-up images, was analysed. Manual segmentation was performed by two human raters. Strong correlations (R = 0.83–0.96, p < 0.001) were observed between volumetric estimates of BraTumIA and of each of the human raters for the contrast-enhancing (CET) and non-enhancing T₂-hyperintense tumor compartments (NCE-T₂). A quantitative analysis of the inter-rater disagreement showed that the disagreement between BraTumIA and each of the human raters was comparable to the disagreement between the human raters. In summary, BraTumIA generated volumetric trend curves of contrast-enhancing and non-enhancing T₂-hyperintense tumor compartments comparable to estimates of human raters. These findings suggest the potential of automated longitudinal tumor segmentation to substitute manual volumetric follow-up of contrast-enhancing and non-enhancing T₂-hyperintense tumor compartments.

Raidionics: an open software for pre- and postoperative central nervous system tumor segmentation and standardized reporting

Article Open access 20 September 2023

David Bouget, Demah Alsinan, … Ingerid Reinertsen

A comprehensive dataset of annotated brain metastasis MR images with clinical and radiomic data

Article Open access 14 April 2023

Beatriz Ocaña-Tienda, Julián Pérez-Beteta, … Víctor M. Pérez-García

Predicting survival of glioblastoma from automatic whole-brain and tumor segmentation of MR images

Article Open access 17 November 2022

Sveinn Pálsson, Stefano Cerri, … Koen Van Leemput

Introduction

The accurate and reproducible measurement of tumor size and its changes over time is of crucial importance for diagnosis, treatment planning as well as monitoring of response to oncologic therapy for brain tumors. Current clinical guidelines (RANO/AvaGlio¹) for response assessment of high-grade glioma patients rely on bidimensional measures. Compared to tumor volumetry, bidimensional measures show several limitations: Increased measurement variability^2,3,4,5, sensitivity to imaging quality⁶ and difficulties in assessing irregularly shaped, unmeasurable or satellite lesions⁷. Volumetry of a tumor requires an operator to outline the tumor and to differentiate between the different tumor compartments and peritumoral changes, which in turn requires considerable skill and expertise in tumor diagnostics as well as in handling of the respective software. Consequently, manual tumor volumetry is a time-consuming procedure prone to subjectivity and hence large inter-observer variability^8,9,10. Fully-automatic segmentation methods constitute a possible solution to these issues. They perform volumetry in a fraction of the usual amount of time, which can take up to one hour per patient, while eliminating intra-observer and inter-observer variability.

A compartmentalisation of high-grade glioma into necrosis, edema, non-enhancing and enhancing tumor has been found to be associated with response to treatment and patient survival^11,12. Recent studies show that a standardised set of Magnetic Resonance (MR) imaging features of these tumor compartments can be used to stratify patients into different risk groups^13,14. In parallel, automatic methods capable of segmenting a high-grade glioma into its subcompartments have been proposed^15,16,17. Such methods rely on imaging information from structural MRI (usually native T₁ weighted (T₁w), T₁w gadolinium enhanced, T₂w and FLAIR sequences) and machine learning techniques for data analysis^10,18.

The majority of studies assessing the potential of computer-assisted segmentation methods for brain tumor volumetry have so far focused on preoperative segmentation. In the study of Porz et al.¹⁹ the segmentation results of an automatic method were compared with manually acquired ground truth data of two expert raters for 25 glioblastoma (GBM) patients. The comparison was performed for the complete tumor (including all four tumor compartments), the tumor core (including necrosis, contrast-enhancing and non-enhancing tumor) and the contrast-enhancing tumor. The study of Steed et al.²⁰ draws a comparison between automatically and manually segmented patient cases extracted from The Cancer Imaging Archive (TCIA). The authors evaluated their method for the segmentation of the enhancing part of the tumor and the FLAIR hyperintensity volume. Both studies found a good agreement between manual and automatic results. Regarding longitudinal segmentation, Weizman et al.²¹ proposed a semi-automatic method capable of subdividing low-grade brain tumors into cystic, solid and enhancing regions and tracking of the volumetric evolution of these subcompartments over time. The authors applied their method to 10 patients, comprising a total of 40 MRI scans and found it to be accurate when compared to manual segmentations. The study of Liberman et al.²² is methodologically the closest to the study at hand. The authors evaluated an automatic segmentation method on 59 longitudinal MR scans of 13 patients with recurrent GBM undergoing bevacizumab therapy. The method performs a segmentation of the tumor volume into enhancing tumor volume, peri- and non-peri-tumoral edema. The focus of the study was on improving accuracy in therapy response assessment defined by the MacDonald’s criteria²³ and manual volumetry. In contrast to the study of Liberman et al.²², our study was performed explicitly in a prospective setting, uses data of patients with newly diagnosed GBM and provides a comparison between the volumetric trend of tumor compartments as estimated by automatic and manual segmentation.

Motivated by the recent developments, we hypothesise that the volumetric trend of tumor compartments in high-grade glioma as captured by an automatic segmentation method is comparable to the trend estimated by time-consuming manual segmentation.

Following previous work, we employ a fully-automated method^17,19 called BraTumIA (Brain Tumor Image Analysis). The aim of our study is to investigate the potential of BraTumIA for longitudinal brain tumor volumetry by comparing the automatically estimated volumes with ground truth data acquired by manual segmentation.

Materials and Methods

Data selection

Manual and automatic segmentations were performed on the longitudinal MRI data of 14 consecutive patients selected from two ongoing, prospective clinical trials in our institution. Imaging data of 64 independent MR acquisitions (each encompassing T₁w, T₁w gadolinium enhanced, T₂w and FLAIR sequences, three to six acquisitions per patient), resulting in a total of 256 different MRI images, was analysed. Only patients with newly diagnosed and histologically confirmed glioblastoma multiforme were eligible for inclusion. The same, standardised MR protocol was performed for all patients. The aim of the clinical trials is to improve the reliability of tumor progression evaluation with MRI and MR Spectroscopy (MRS) before, during and after therapy with neurosurgery, radiotherapy, chemotherapy and/or anti-angiogenic therapy. Seven of the 14 patients received a first line therapy with Bevacizumab in combination with radiotherapy. The main exclusion criteria were incomplete MRI data acquisition, previous cranial neurosurgery, Karnofsky perfomance status lower than 70% and pathological organ function (liver, kidney, impaired hematological function). An overview of the patient data is presented in Table 1. The studies were approved by the Local Research Ethics Commission (Kantonale Ethikkommission Bern) and all methods were carried out in accordance with the approved guidelines. All patients provided written informed consent.

Table 1 Longitudinal patient data used for evaluation.

Full size table

MR Acquisition

We performed a standardised MR protocol for all patients. All sequences were acquired on a 1.5T MR scanner from Siemens (Siemens Avanto and Siemens Area, Siemens, Erlangen/Germany). For manual and automatic segmentation the following sequences were used: i) 2D T₂w MRI sequence with fluid-attenuated inversion recovery impulse (T₂w FLAIR) in axial acquisition, TE = 80 ms, TR = 8000 ms, FOV = 256 × 256 mm², FA = 120°, anisotropic voxel size of 1 mm × 1 mm × 3 mm; ii) 3D T₂w SPACE in sagittal acquisition, TE = 380 ms, TR = 3000 ms, FOV = 256 × 256 mm², FA = 120°, isotropic voxel size of 1 mm × 1 mm × 1 mm; iii) 3D T₁w MPR without contrast enhancement in sagittal acquisition, TE = 2.67 ms, TR = 1580 ms, FOV = 256 × 256 mm², FA = 8°, isotropic voxel size of 1 mm × 1 mm × 1 mm; iv) 3D T₁w with gadolinium contrast enhancement in sagittal acquisition, TE = 4.57 ms, TR = 2070 ms, FOV = 256 × 256 mm², FA = 15°, isotropic voxel size of 1 mm × 1 mm × 1 mm.

Manual Segmentation

The different MR sequences were skull-stripped²⁴ and co-registered on the T₁w gadolinium enhanced sequence using a rigid transformation. This step is part of the BraTumIA software and facilitated the comparison to the automatic segmentations which were generated from the same co-registered images. After that, the images were manually segmented by two human raters blinded to clinical history and diagnosis. The raters used all four MR sequences (T₁w, T₁w gadolinium enhanced, T₂w and FLAIR) simultaneously to annotate a data set. The images were strictly segmented according to their timing, starting with the preoperative image and independent of the other MR acquisitions of the same patient (i.e. in a prospective fashion). Both raters adhered to a predefined segmentation protocol²⁵ and used 3D Slicer²⁶ for generating a manual segmentation of the complete tumor into necrosis, edema, non-enhancing and enhancing tumor. Rater-1 is an experienced neuroradiologist with several years of experience in brain tumor image analysis, whereas Rater-2 is a M.D. master student previously trained in neuroimaging with initial experience in the field. In two out of the total 64 MR acquisitions the automatic skull-stripping failed and a manual correction was applied.

Fully-automatic Segmentation

For the purpose of automatic segmentation we employed the software BraTumIA^15,17. BraTumIA allows the user to load four standard MRI sequences (T₁w, T₁w gadolinium enhanced, T₂w and FLAIR images) that constitute a neurooncological MR protocol according to the RANO criteria²⁷. The processing of the imaging information starts with the generation of a brain mask used for automatic skullstripping. This step is followed by a multimodal rigid registration where the T₁w gadolinium enhanced image serves as template (i.e. all images are aligned to this image). Finally, a bias-field correction²⁸ is applied and the image intensities are normalised²⁹.

After preprocessing, a machine learning-based framework is used for performing segmentation via voxel-wise tissue classification. BraTumIA classifies every voxel into either one of three unaffected tissue classes (gray matter, white matter and cerebrospinal fluid) or into either one of four different tumor classes (necrosis, edema, non-enhancing and enhancing tumor). To perform this classification, the software starts with the extraction of voxel-wise feature vectors composed of appearance-sensitive features (multisequential intensities and intensity differences, first-order and gradient textures) and context-sensitive features (atlas-normalised coordinates, multi-scale symmetry features and ray features)¹⁷. Based on this feature vector, every voxel is classified by a decision forest^30,31 into one of the tissue classes. The strengths of the decision forest classifier are that it can handle high-dimensional input data (BraTumIA employs a 237-dimensional feature vector), it can handle multi-label classification problems (BraTumIA performs a segmentation into seven different tissue classes) and its output is a probability distribution over the different tissue classes. The predicted tissue class for a particular voxel is chosen to be the one with the highest probability. In a final step, the label map generated by the decision forest is refined by a regularisation, which enforces spatial consistency of classified voxels with respect to their neighborhood through a conditional random field based optimisation¹⁵.

In contrast to Porz et al.,¹⁹ the training data of 36 preoperative patient cases for BraTumIA was enlarged with longitudinal imaging data sets. For a subset of nine patients, we obtained nine immediate postoperative images. Moreover, for four out of these nine patients we obtained additional nine follow-up images (acquired within one to six months after surgery). This led to a total of 54 imaging data sets of 36 different patients used for training.

Statistical Analysis

The statistical analysis was targeted at the two morphologically most discriminative tumor compartments, which are the contrast-enhancing tumor (CET) and the non-enhancing T₂-hyperintense part (NCE-T₂) of the tumor following the modified RANO recommendations^1,27. In our analysis, the latter encompasses the segmentation of the non-enhancing tumor and edema. We did not include necrosis due to the fact that this compartment is usually completely resected and reappears often late after surgery showing initially a small volume.

Our statistical analysis followed a descriptive approach. Absolute and relative volumes were plotted against time. Based on the imaging protocol, we defined seven different time points: t_pre (preoperative image), t_post (immediate postoperative image), t₁ (one month follow-up), t₃ (three months follow-up), t₆ (six months follow-up), t₉ (nine months follow-up), t₁₂ (12 months follow-up). Absolute volumes measured by a particular rater were expressed relative to the preoperative volume . Overlap between segmentation results was evaluated using the Dice-coefficient. Tumor volumes smaller than 90 mm³ were not considered for computing the Dice-coefficient due to the Dice-Coefficient being overly sensitive for changes in small volumes.

To quantify the agreement between raters regarding tumor volume progression, the slope between consecutive time points was first measured for each rater . Negative and positive slopes then correspond to shrinking and growing of tumor volumes, respectively. A different sign (negative or positive) between raters is regarded as a disagreement. The total number of disagreements is then used to characterise the disagreement on the complete longitudinal tumor volume progression. Furthermore, we computed the relative increase or decrease of a particular volume measured between two consecutive time points for each rater: . For visualisation, we computed the decimal logarithm of this ratio. We refer to this simply as disagreement plot. A stable volume measured from one to another time point would yield a value of zero. A disagreement between raters would result in one or more data points lying on different sides with respect to the zero line. If such a disagreement occurs, we measure the absolute difference (along the y-axis) between the disagreeing raters. This analysis was performed for all possible pairs of raters over the complete data set. The results can then be summarised in what we refer to as a disagreement matrix. Every entry of the matrix corresponds to the total sum of all measured disagreements (differences). This yields more information about the extent to which two raters are disagreeing with each other. The matrix is symmetric due to the fact that we measure the absolute difference.

In the remainder of this paper, we refer to BraTumIA (B) as well as the human raters (R1 & R2) simply as raters for the sake of clarity, but make a distinction between automated and manual results when appropriate. For the statistical analysis, we used the R software package (R Development Core Team).

Results

An exemplary segmentation of a particular patient over time for each BraTumIA, Rater-1 and Rater-2 is shown in Fig. 1. Note that the segmentation result of BraTumIA indicates the same volume progression as the manual segmentations. From a total of 64 MR acquisitions, tumor volumes were obtained and corresponding volume differences (n = 50) computed. In Fig. 2 absolute volumes and volume differences between consecutive time points as measured by BraTumIA were plotted against the estimates of Rater-1 and Rater-2. Strong significant correlations (r-values ranging from 0.83 to 0.96, p < 0.001) were observed between all estimates of BraTumIA and of each of the raters. The measured Dice-coefficients between BraTumIA and each of the human raters are reported in Table 2.

Table 2 Dice-coefficients for non-enhancing T₂-hyperintense tissue (NCE-T₂) and contrast-enhancing tumor (CET) as tuple (median, range) generated by BraTumIA (B) with respect to either one of two human raters (R1/R2).

Full size table

Segmentation Tendencies

In Fig. 3 the volumetric trend lines of a representative patient as defined by the different raters are shown. For the NCE-T₂ compartment, BraTumIA tended to yield a larger postoperative absolute volume than the human raters. For the contrast-enhancing tumor, one can note a larger absolute volume in case of the preoperative, the immediate postoperative measurement and one month follow-up, whereas for the remaining follow-up images the volumetric estimates of BraTumIA were located between the estimates of the human raters. When computing the relative over- or underestimation of BraTumIA with respect to each human rater and of all patients, a similar trend as for the patient shown in Fig. 3 can be observed. Table 3 gives an overview of the tendencies of BraTumIA with respect to the human raters. A consistent overestimation for all postoperative time points in the case of the NCE-T₂ compartment as well as for the preoperative and immediate postoperative CET volumes was observed. For the segmentation of CET in the remaining follow-up images BraTumIA yielded absolute volumes within the range of volumes defined by both human raters.

Table 3 Relative over- or underestimation of the volumes for non-enhancing T₂-hyperintense tissue (NCE-T₂) and contrast-enhancing tumor (CET) as median values generated by BraTumIA (B) with respect to either one of two human raters (R1/R2).

Full size table

Longitudinal Volumetry

The volumetric estimates for the NCE-T₂ compartment and CET were normalised with respect to the first measurement (i.e. t_pre). This way, the individual tendencies shown before can be mitigated and differences in the volumetric trend between raters can be pointed out more easily. In Fig. 4, the relative volumetric curves for nine patients are shown. For the first six patients, a general agreement between all three raters on the trend of the NCE-T₂ compartment as well as contrast-enhancing tumor was observed. For patient “g”, the occurrence of a lamellar enhancement close to the resection site of the primary tumor led to a higher increase in CET volume computed by BraTumIA and Rater-2 compared to Rater-1. For patient “h”, a diffuse and weak tumor enhancement occurred, which was only detected by Rater-1 (at t₃). For patient “i”, we can observe disagreements between all three raters for small NCE-T₂ volume changes occurring after surgery. From a qualitative point of view, 11 out of 14 patients showed a good agreement among all three raters. Three patients (two of them depicted in Fig. 4) showed a (clinically) significant deviation between the raters, whereas one case (depicted in supplementary Fig. S1 “a”) was a patient undergoing biopsy instead of surgical resection. A complete overview of the longitudinal volumetric curves for all 14 patients is given by Figs 4 and S1.

Inter-rater Disagreement

In total, there are 64 individual time points wherein a segmentation was performed. This yielded a total of 50 transitions from one time point to the next, where potentially a disagreement can occur. When comparing the trend lines of BraTumIA against Rater-1 for all 50 transitions, a disagreement for the trend of the NCE-T₂ compartment was found in 11 transitions (22%) and in four transitions (8%) for the contrast-enhancing tumor. When comparing BraTumIA against Rater-2, a disagreement was detected for the NCE-T₂ compartment in 12 occurrences (24%) and in two transitions (4%) for the contrast-enhancing tumor. For the two human raters, a disagreement for the NCE-T₂ compartment occurred in 13 transitions (26%) and in four transitions (8%) for the contrast-enhancing tumor. An agreement on the trend line between both human raters but a disagreement of BraTumIA was found in six transitions out of 50 (12%) for the NCE-T₂ tissue (three of which are located in the patient shown in Fig. 3), however, did not occur for the contrast-enhancing tumor. The disagreement plot for an exemplary patient is shown in Fig. 5. Figures 6 and 7 present the disagreement matrices for NCE-T₂ tissue and CET, respectively. When comparing the two matrices, one can observe that if a disagreement between raters (as defined in section 2.5) occurs, it is larger for the contrast-enhancing volume than for the NCE-T₂ tissue. In general, BraTumIA disagreed less with Rater-2 than with Rater-1. For the NCE-T₂ compartment the total disagreement between BraTumIA and either one of the two human raters (3.17 and 2.74) was smaller than the disagreement between the two human raters (3.43). For the contrast-enhancing tumor the total disagreement between BraTumIA and Rater-1 (7.8) was larger than the disagreement between the two human raters (7.35).

Discussion

The study at hand provides evidence for the capability of a fully-automatic segmentation method for longitudinal brain tumor volumetry via comparison of its performance against two expert raters. Brain tumor segmentation is routinely needed in radiation oncology, where it plays a crucial role in the planning of radiotherapy. The output of a tumor segmentation yields information about its volume as well as its position relative to neighboring and potentially eloquent anatomical structures. Manual segmentation is inherently subjective, thus the estimated volumes show large variability when compared between different raters⁸. In radiotherapy, the planning is greatly influenced by an accurate estimation of the target volumes, which means that segmentations of different raters can lead to different treatment plans. Early on, researchers in radiation oncology investigated computer-assisted methods to relieve clinicians from the time-consuming burden of manual segmentation as well as generate more consistent volumetric estimates³². The potential areas of applications for brain tumor volumetry span well beyond radiotherapy and include neurooncology (response assessment^{6,27,33,34,35,36}), neurosurgery^37,38,39 and radiogenomics^12,40,41.

BraTumIA is a fully-automatic, machine learning-based segmentation model capable of subdividing a glioma into its compartments (necrosis, edema, enhancing and non-enhancing tumor). The performance of BraTumIA was compared against alternative approaches in the MICCAI BraTS Challenges^10,17, where it proved to be one of the best performing as well as fastest methods. Moreover, BraTumIA was evaluated prospectively on clinical data sets for preoperative brain tumor segmentation in the past¹⁹. Strong correlations were observed between automatically and manually generated tumor volumes. Furthermore, the estimated tumor volumes of BraTumIA were recently shown to be associated with patient survival⁴². The software is equipped with a Graphical User Interface (GUI), which facilitates its use by clinicians. In addition, it has been made publicly-available (https://www.nitrc.org/projects/bratumia/). This led us to the decision to employ BraTumIA for performing automatic longitudinal tumor volumetry.

The agreement between the segmentation result of BraTumIA and the estimates of the two human raters was assessed via computing Dice-coefficients. Looking at Table 2, it is evident that the preoperative segmentation task seems to be the easiest both for BraTumIA and the human raters. Notice that the measured preoperative Dice-coefficients are comparable to the values reported during the MICCAI BraTS 2012/2013 challenges¹⁰. Furthermore, the segmentation of residual CET after surgery is extremely challenging, which is confirmed by the low Dice-coefficients for all pairings of raters. However, overlap measures such as the Dice-coefficient are overly sensitive if the segmented image regions are small in size. This is the case for residual CET in postoperative images. The segmentation of both NCE-T₂ tissue and CET appears to be challenging in the one month follow-up image acquisition. The reason is the emergence of changes in image appearance at that time similar to pathological changes (e.g. benign contrast-enhancement) but in fact they are induced by the treatment. A reason for the decrease in volumetric overlap between the estimates of BraTumIA and the two human raters for postoperative images could be the fact that a major part of the available training data (i.e. 36 data sets) were preoperative images. The addition of more postoperative images would likely improve the performance of BraTumIA for segmenting later time points. A thorough comparison of the segmentation performance of BraTumIA to several other segmentation techniques is provided by Menze et al.¹⁰ based on the results of the MICCAI BraTS challenges 2012/2013 (in this publication BraTumIA is referred to as the approach proposed by Meier et al.¹⁷).

BraTumIA proved to effectively capture the volumetric trend of the NCE-T₂ compartment and CET over time. A trend towards overestimation after surgery with respect to both human raters for the NCE-T₂ tissue was observed. This corresponds to a relatively stable bias of the volumetric trend curve as generated by BraTumIA. In neurooncology, criteria for assessing response to therapy rely on relative changes in tumor size between a given point in time and a baseline²⁷. Thus, a decision on response to therapy is not affected by a constant bias of the chosen measurement method. For the CET volume measured immediately after surgery, BraTumIA tends to oversegment the residual tumor volume and in case of complete resection rarely segmented a volume of zero mm³. Thus, the high values reported in Table 3 for immediate postoperative CET volumes are due to the fact that seven out of 14 patients underwent complete resection. Residual tumor volumes are usually small and often rated as unmeasurable by the current response assessment criteria. In order to employ BraTumIA for response assessment, it is crucial to study the effect of changing non-measurable lesions to measurable lesions. This effect may be emphasised due to the overestimation of residual volumes when compared with manual volumetry. More specialised algorithms developed specifically for the assessment of residual tumor volume after surgery have been proposed^7,43,44 and may yield an improved performance over general segmentation methods like BraTumIA.

Our results show that the longitudinal volumetry of the CET generated by BraTumIA is closer to Rater-2 than Rater-1. In general, BraTumIA as well as Rater-2 tend to segment CET more aggressively than Rater-1. This is clearly visible in their oversegmentation of CET compared to Rater-1. This can potentially lead to false positives as for example in case of patient “g”, where both BraTumIA and Rater-2 marked a lamellar enhancement appearing at the border of the resection cavity (one month after surgery) as contrast-enhancing tumor. Follow-up images revealed that the enhancement was a consequence of the applied treatment regime. However, for patient “h” BraTumIA and Rater-2 failed to recognise a subtle contrast-enhancing tumor (at t₃). Rater-1 was an experienced neuroradiologist, whereas Rater-2 was a M.D. student with initial experience in the field. Consequently, one can argue that Rater-1 relied in both situations on his experience as well as knowledge about the patient’s history and applied treatment regime to either rule in or rule out the presence of contrast-enhancing tumor. However, BraTumIA is segmenting every patient image individually and does not include information extracted from previous scans of the same patient. Hence, we think that a possibility for improving BraTumIA might be a more explicit inclusion of past imaging information of a patient⁴⁵ as well as information from clinical variables about the applied treatment regime (e.g. radiotherapy yes/no). This may enable BraTumIA to better rule out non-tumorous contrast-enhancements as well as to detect subtle changes within a patient. Furthermore, we observed that for the NCE-T₂ compartments, disagreements occurred predominantly for small relative changes in volume (e.g. for patient “i”). In the context of response assessment, such changes would likely be rated as stable disease.

The analysis of disagreements between raters was driven by the importance of the relative change in tumor size for tumor response assessment^1,27. BraTumIA showed a disagreement with two expert raters that lied in the range of the disagreement between the human raters themselves. Bi-dimensional measures used for response assessment have well known limitations. Wen et al.²⁷ suggested that a main obstacle for the inclusion of volumetric information in response assessment is the lack of standardisation in volumetric imaging. In addition, the necessary time to manually acquire volumetric information renders its clinical use unfeasible. BraTumIA was trained on imaging data that is not part of this study and which was acquired following different acquisition protocols. The software is capable of generalising imaging patterns of CET and NCE-T₂ compartments learned from training data to unseen patient data. Moreover, BraTumIA can provide volumetric information fully-automatically for tumor compartments within five minutes (average computation time per patient).

We proposed disagreement plots and matrices for evaluating the disagreement between different raters. An alternative method to analyse the disagreement between raters or measurement methods is the Bland-Altman plot⁴⁶, which is widely used (e.g. by Bauknecht et al.⁴⁷). The main goal of our data analysis was to capture and quantify deviations between the volumetric trend for CET and the NCE-T₂ compartment estimated by BraTumIA and the two raters. A Bland-Altman approach for this issue would require one plot for every transition between consecutive time points and every possible pairing of measurement methods or raters. Given the small sample size for late postoperative transitions (e.g n = 4 from t₆ to t₉), the estimation of 95% limits of agreement in the Bland-Altman plot would be unreliable. Consequently, we opted for a more descriptive and visual approach in form of disagreement plots and matrices. A disagreement plot is patient-specific and retains the complete temporal evolution as well as the entirety of the employed raters. In a next step, a specific feature (e.g. the absolute differences between disagreements) of the data can then be summarised among all patients, e.g. in the form of disagreement matrices. Different definitions of disagreements would result in different matrices that may not necessarily be symmetric. For instance, for a larger number of raters, the correlation between disagreement matrices could potentially be assessed by a Mantel test⁴⁸. Weizman et al.²¹ followed a different approach for assessing the agreement between longitudinal volumetric curves. They estimated Pearson’s correlation coefficients between two subsequent measurements of two raters. This requires the assumption of a linear relationship between the volumetric measurement of two consecutive time points. Furthermore, summarising correlation coefficients over a number of samples (=patients) is not straightforward. In contrast, the use of disagreement plots and matrices is nonparametric and allows the extraction of different features, which can be summarised over all patients.

There are two limitations of the present study. First, we limited our analysis to the two morphologically most discriminative tumor compartments. This restriction was driven by the lack of histologically-confirmed ground truth data for postoperative images. However, in clinical practice a radiologist segments the extent of the tumor compartment that is visually apparent in MR images. In light of the fact that the postoperative discrimination between edema and non-enhancing tumor is very difficult even for expert raters and due to the absence of its histological confirmation, we combined the segmentation of edema and non-enhancing tumor in one compartment (i.e. the non-enhancing T₂-hyperintense tissue). Furthermore, the chosen approach of analysing the non-enhancing T₂-hyperintense part of the tumor was used previously^20,40. Second, a higher number of human raters would have allowed us to conduct a more informative assessment of the inter-rater variability of manual segmentations when compared with the automatic segmentations generated by BraTumIA. The methodology of employing two human raters for segmentation has been used in several studies before, as reported by Bauknecht et al.⁴⁷.

In summary, BraTumIA generated volumetric trend curves of contrast-enhancing and non-enhancing T₂-hyperintense tumor compartments comparable to estimates of human raters. Strong correlations between the volumetric estimates of BraTumIA and the two raters were observed. The frequency and extent of disagreement between BraTumIA and either one of the two human raters was comparable to the values measured between the two human raters themselves. This implies that BraTumIA can be used as a substitute for manual volumetric follow-up of contrast-enhancing and non-enhancing T₂-hyperintense tumor compartments over time.

Additional Information

How to cite this article: Meier, R. et al. Clinical Evaluation of a Fully-automatic Segmentation Method for Longitudinal Brain Tumor Volumetry. Sci. Rep. 6, 23376; doi: 10.1038/srep23376 (2016).

References

Chinot, O. L. et al. Response assessment criteria for glioblastoma: practical adaptation and implementation in clinical trials of antiangiogenic therapy. Curr. Neurol. Neurosci. Rep. 13, 347 (2013).
Article PubMed PubMed Central Google Scholar
Sorensen, A. G. et al. Comparison of diameter and perimeter methods for tumor volume calculation. J. Clin. Oncol. 19, 551–557 (2001).
Article CAS PubMed Google Scholar
Fraioli, F. et al. Volumetric evaluation of therapy response in patients with lung metastases. Preliminary results with a computer system (CAD) and comparison with unidimensional measurements. Radiol. Med. 111, 365–375 (2006).
Article CAS PubMed Google Scholar
Marten, K. et al. Inadequacy of manual measurements compared to automated CT volumetry in assessment of treatment response of pulmonary metastases using RECIST criteria. Eur. Radiol. 16, 781–790 (2006).
Article PubMed Google Scholar
Kanaly, C. W. et al. A novel, reproducible and objective method for volumetric magnetic resonance imaging assessment of enhancing glioblastoma. J. Neurosurg. 76, 1–7 (2014).
Google Scholar
Reuter, M. et al. Impact of MRI head placement on glioma response assessment. J. Neurooncol. 118, 123–129 (2014).
Article PubMed PubMed Central Google Scholar
Kanaly, C. W. et al. A novel method for volumetric MRI response assessment of enhancing brain tumors. PloS One 6, doi: 10.1371/journal.pone.0016031 (2011).
Weltens, C. et al. Interobserver variations in gross tumor volume delineation of brain tumors on computed tomography and impact of magnetic resonance imaging. Radiother. Oncol. 60, 49–59 (2001).
Article CAS PubMed Google Scholar
Egger, J. et al. GBM volumetry using the 3D Slicer medical image computing platform. Sci. Rep. 3, doi: 10.1038/srep01364 (2013).
Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015).
Article PubMed Google Scholar
Pope, W. B. et al. MR imaging correlates of survival in patients with high-grade gliomas. AJNR Am. J. Neuroradiol. 26, 2466–2474 (2005).
PubMed PubMed Central Google Scholar
Iliadis, G. et al. Volumetric and MGMT parameters in glioblastoma patients: Survival analysis. BMC Cancer 12, doi: 10.1186/1471-2407-12-3 (2012).
Gutman, D. A. et al. MR imaging predictors of molecular profile and survival: multi-institutional study of the TCGA glioblastoma data set. Radiology 267, 560–569 (2013).
Article PubMed PubMed Central Google Scholar
Mazurowski, M. A., Desjardins, A. & Malof, J. M. Imaging descriptors improve the predictive power of survival models for glioblastoma patients. Neuro Oncol. 15, 1389–1394 (2013).
Article PubMed PubMed Central Google Scholar
Bauer, S., Nolte, L.-P. & Reyes, M. Fully Automatic Segmentation of Brain Tumor Images using Support Vector Machine Classification in Combination with Hierarchical Conditional Random Field Regularization. In MICCAI 2011 of LNCS Vol. 6893 (eds Fichtinger, G. et al. ), 354–361 (Springer, 2011).
Google Scholar
Zikic, D. et al. Decision forests for tissue-specific segmentation of high-grade gliomas in multi-channel MR. In MICCAI 2012 of LNCS Vol. 7512 (eds Ayache, N. et al. ), 369–376 (Springer, 2012).
Google Scholar
Meier, R., Bauer, S., Slotboom, J., Wiest, R. & Reyes, M. Appearance-and context-sensitive features for brain tumor segmentation. In Proceedings of MICCAI BRATS Challenge. (2014) Available at: http://people.csail.mit.edu/menze/papers/proceedings_miccai_brats_2014.pdf. (Accessed: 5th May 2015).
Bauer, S., Wiest, R., Nolte, L.-P. & Reyes, M. A survey of MRI-based medical image analysis for brain tumor studies. Phys. Med. Biol. 58, 97–129 (2013).
Article Google Scholar
Porz, N. et al. Multi-modal glioblastoma segmentation: Man versus machine. PLoS One 9, doi: 10.1371/journal.pone.0096873 (2014).
Steed, X. T. C. et al. Iterative probabilistic voxel labeling : Automated segmentation for analysis of the cancer imaging archive glioblastoma Images. AJNR Am. J. Neuroradiol. 36, 678–685 (2015).
Article CAS PubMed PubMed Central Google Scholar
Weizman, L. et al. Semiautomatic segmentation and follow-up of multicomponent low-grade tumors in longitudinal brain MRI studies. Med. Phys. 41, doi: 10.1118/1.4871040 (2014).
Liberman, G. et al. Automatic multi-modal MR tissue classification for the assessment of response to bevacizumab in patients with glioblastoma. Eur. J. Radiol. 82, 87–94 (2013).
Article Google Scholar
MacDonald, D., Cascino, T., Schold, S. J. & Cairncross, J. Response criteria for phase II studies of supratentorial malignant glioma. J. Clin. Oncol. 8, 1277–1280 (1990).
Article CAS PubMed Google Scholar
Bauer, S., Fejes, T. & Reyes, M. A skull-stripping filter for ITK. Insight J. 20, 1–7 (2012).
Google Scholar
Jakab, A. Segmenting brain tumors with the slicer 3D software. Technical Report. (2012) Available at: http://www2.imm.dtu.dk/projects/BRATS2012/Jakab_TumorSegmentation_Manual.pdf. (Accessed: 11th May 2015).
Fedorov, A. et al. 3D Slicer as an image computing platform for the quantitative imaging network. Magn. Reson. Imaging 30, 1323–1341 (2012).
Article PubMed PubMed Central Google Scholar
Wen, P. Y. et al. Updated response assessment criteria for high-grade gliomas: Response assessment in neuro-oncology working group. J. Clin. Oncol. 28, 1963–1972 (2010).
Article PubMed Google Scholar
Tustison, N. J. et al. N4ITK: Improved N3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320 (2010).
Article PubMed PubMed Central Google Scholar
Nyul, L. G., Udupa, J. K. & Zhang, X. New variants of a method of MRI scale standardization. IEEE Trans. Med. Imaging 19, 143–150 (2000).
Article CAS PubMed Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Criminisi, A. & Shotton, J. In Decision Forests for Computer Vision and Medical Image Analysis, (eds Criminisi, A. et al. ), 25–45 (Springer, 2013).
Mazzara, G. P., Velthuizen, R. P., Pearlman, J. L., Greenberg, H. M. & Wagner, H. Brain tumor target volume determination for radiation treatment planning through automated MRI segmentation. Int. J. Radiat. Oncol. Biol. Phys. 59, 300–312 (2004).
Article PubMed Google Scholar
Galanis, E. et al. Validation of neuroradiologic response assessment in gliomas: measurement by RECIST, two-dimensional, computer-assisted tumor area and computer-assisted tumor volume methods. Neuro Oncol. 8, 156–165 (2006).
Article PubMed PubMed Central Google Scholar
Shah, G. D. et al. Comparison of linear and volumetric criteria in assessing tumor response in adult high-grade gliomas. Neuro Oncol. 8, 38–46 (2006).
Article PubMed PubMed Central Google Scholar
Pirzkall, A. et al. Tumor regrowth between surgery and initiation of adjuvant therapy in patients with newly diagnosed glioblastoma. Neuro Oncol. 11, 842–852 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ellingson, B. M. et al. Quantitative volumetric analysis of conventional MRI response in recurrent glioblastoma treated with bevacizumab. Neuro Oncol. 13, 401–409 (2011).
Article CAS PubMed PubMed Central Google Scholar
Stummer, W. et al. Extent of resection and survival in glioblastoma multiforme: Identification of and adjustment for bias. Neurosurgery 62, 564–574 (2008).
Article PubMed Google Scholar
Jakola, A. S. et al. Surgically acquired deficits and diffusion weighted MRI changes after glioma resection - A matched case-control study with blinded neuroradiological assessment. PLoS One 9, doi: 10.1371/journal.pone.0101805 (2014).
Grabowski, M. M. et al. Residual tumor volume versus extent of resection: predictors of survival after surgery for glioblastoma. J. Neurosurg. 121, 1115–1123 (2014).
Article PubMed Google Scholar
Zinn, P. O. et al. Radiogenomic mapping of edema/cellular invasion MRI-phenotypes in glioblastoma multiforme. PLoS One 6, doi: 10.1371/journal.pone.0025451 (2011).
Naeini, K. M. et al. Identifying the mesenchymal molecular subtype of glioblastoma using quantitative volumetric analysis of anatomic magnetic resonance images. Neuro Oncol. 15, 626–634 (2013).
Article CAS PubMed PubMed Central Google Scholar
Rios Velazquez, E. et al. Fully automatic GBM segmentation in the TCGA-GBM dataset : Prognosis and correlation with VASARI features. Sci. Rep. 5, doi: 10.1038/srep16822 (2015).
Cordova, J. S. et al. Quantitative tumor segmentation for evaluation of extent of glioblastoma resection to facilitate multisite clinical trials. Transl. Oncol. 7, 40–47 (2014).
Article PubMed PubMed Central Google Scholar
Meier, R., Bauer, S., Slotboom, J., Wiest, R. & Reyes, M. Patient-specific semi-supervised learning for postoperative brain tumor segmentation. In MICCAI 2014 of LNCS Vol. 8673 (eds Golland, P. et al. ), 714–721 (Springer, 2014).
Google Scholar
Bauer, S., Tessier, J., Krieter, O., Nolte, L. P. & Reyes, M. Integrated spatio-temporal segmentation of longitudinal brain tumor imaging studies. In MICCAI 2013 MCV Workshop of LNCS Vol. 8331 (eds Menze, B. et al. ), 74–83 (Springer, 2014).
Google Scholar
Altman, D. G. & Bland, J. M. Measurement in medicine : the analysis of method comparison studies. Statistician 32, 307–317 (1983).
Article Google Scholar
Bauknecht, H.-C. et al. Intra- and interobserver variability of linear and volumetric measurements of brain metastases using contrast-enhanced magnetic resonance imaging. Invest. Radiol. 45, 49–56 (2010).
Article PubMed Google Scholar
Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Res. 27, 209–220 (1967).
CAS PubMed Google Scholar

Download references

Acknowledgements

This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement No. 600841. Additionally, this work was supported by the Swiss National Foundation by grant number 140958.

Author information

Meier Raphael and Knecht Urspeter contributed equally to this work.
Wiest Roland and Reyes Mauricio jointly supervised this work.

Authors and Affiliations

Institute for Surgical Technology & Biomechanics, University of Bern, Bern, Switzerland
Raphael Meier, Stefan Bauer & Mauricio Reyes
Support Center for Advanced Neuroimaging – Institute for Diagnostic and Interventional Neuroradiology, University Hospital and University of Bern, Bern, Switzerland
Urspeter Knecht, Tina Loosli, Stefan Bauer, Johannes Slotboom & Roland Wiest

Authors

Raphael Meier
View author publications
You can also search for this author in PubMed Google Scholar
Urspeter Knecht
View author publications
You can also search for this author in PubMed Google Scholar
Tina Loosli
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Slotboom
View author publications
You can also search for this author in PubMed Google Scholar
Roland Wiest
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Reyes
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.M., U.K. and T.L. conceived and conducted the experiments, analysed the results and wrote the paper. S.B., J.S., R.W. and M.R. provided expert guidance or data and reviewed the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Meier, R., Knecht, U., Loosli, T. et al. Clinical Evaluation of a Fully-automatic Segmentation Method for Longitudinal Brain Tumor Volumetry. Sci Rep 6, 23376 (2016). https://doi.org/10.1038/srep23376

Download citation

Received: 16 October 2015
Accepted: 04 March 2016
Published: 22 March 2016
DOI: https://doi.org/10.1038/srep23376

This article is cited by

Locoregional delivery of IL-13Rα2-targeting CAR-T cells in recurrent high-grade glioma: a phase 1 trial
- Christine E. Brown
- Jonathan C. Hibbard
- Behnam Badie
Nature Medicine (2024)
Deep learning-assisted radiomics facilitates multimodal prognostication for personalized treatment strategies in low-grade glioma
- P. Rauch
- H. Stefanits
- M. Gmeiner
Scientific Reports (2023)
Recent advancement in learning methodology for segmenting brain tumor from magnetic resonance imaging -a review
- Sunayana G. Domadia
- Falgunkumar N. Thakkar
- Mayank A. Ardeshana
Multimedia Tools and Applications (2023)
A multicenter evaluation of a deep learning software (LungQuant) for lung parenchyma characterization in COVID-19 pneumonia
- Camilla Scapicchio
- Andrea Chincarini
- Alessandra Retico
European Radiology Experimental (2023)
A comprehensive dataset of annotated brain metastasis MR images with clinical and radiomic data
- Beatriz Ocaña-Tienda
- Julián Pérez-Beteta
- Víctor M. Pérez-García
Scientific Data (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Materials and Methods

Data selection

MR Acquisition

Manual Segmentation

Fully-automatic Segmentation

Statistical Analysis

Results

Segmentation Tendencies

Longitudinal Volumetry

Inter-rater Disagreement

Discussion

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links