Assessing robustness of radiomic features by image perturbation

Image features need to be robust against differences in positioning, acquisition and segmentation to ensure reproducibility. Radiomic models that only include robust features can be used to analyse new images, whereas models with non-robust features may fail to predict the outcome of interest accurately. Test-retest imaging is recommended to assess robustness, but may not be available for the phenotype of interest. We therefore investigated 18 combinations of image perturbations to determine feature robustness, based on noise addition (N), translation (T), rotation (R), volume growth/shrinkage (V) and supervoxel-based contour randomisation (C). Test-retest and perturbation robustness were compared for combined total of 4032 morphological, statistical and texture features that were computed from the gross tumour volume in two cohorts with computed tomography imaging: I) 31 non-small-cell lung cancer (NSCLC) patients; II): 19 head-and-neck squamous cell carcinoma (HNSCC) patients. Robustness was determined using the 95% confidence interval (CI) of the intraclass correlation coefficient (1, 1). Features with CI ≥ 0:90 were considered robust. The NTCV, TCV, RNCV and RCV perturbation chain produced similar results and identified the fewest false positive robust features (NSCLC: 0.2–0.9%; HNSCC: 1.7–1.9%). Thus, these perturbation chains may be used as an alternative to test-retest imaging to assess feature robustness.


Introduction
Radiomics is the high-throughput quantitative analysis of medical imaging to facilitate model-based treatment decisions 1,2 .It relies on the computation of image biomarkers (features) within a region of interest (ROI).Features quantify different aspects of the ROI, such as mean intensity, volume and texture heterogeneity.Variations in patient positioning, image acquisition and segmentation affect each feature to varying degrees 3,4 .If radiomic models use features that are not robust against such influences, they will perform poorly when applied to new data 5 .Assessing feature robustness is thus recommended to improve generalisability of radiomic models.

Original Volume adaptation
Contour randomisation Noise addition

Translation Rotation
Figure 1 -Perturbation examples.To perturb an image (blue) and the region of interest mask (orange overlay), the original image is translated, rotated, noised, and has its mask adapted and randomised.
Translation and rotation change both the image and its mask, whereas noise only distorts the image.Volume adaptation and contour randomisation change the mask by adding (green overlay) and removing voxels (red overlay).Note that translation and rotation require additional interpolation (not shown).Test-retest ICC

Perturbation ICC
Figure 2 -Workflow to determine the test-retest and perturbation intraclass correlation coefficients (ICC) for each feature.The test-retest ICC was calculated directly between the same features in both images.To derive the perturbation ICC, an ICC was first calculated between feature values in perturbations of image 1 (ICC 1) and then again in perturbations of image 2 (ICC 2).The perturbation ICC is the average of ICC 1 and 2.

Comparison between NSCLC and HNSCC cohorts
To validate the basic premise that feature robustness is dependent on the phenotype, we compared feature robustness based on the test-retest ICC in both cohorts.In the NSCLC cohort 2963 (73.5%) features were robust and 1069 (26.5%) were non-robust.In the HNSCC cohort 1369 (34.0%) features were robust and 2663 (66.0%) non-robust.1116 (27.7%) and 816 (20.2%) features were robust and non-robust in both cohorts, respectively.The robustness of the other 2100 (52.1%) features was assessed differently between cohorts.1847 (45.8%) features were robust in the NSCLC cohort, but not in the HNSCC cohort, and 253 (6.3%) features the other way around.

Feature-wise comparison of perturbation and test-retest robustness
Test-retest and perturbation robustness were also compared directly for the same feature.A feature is either robust under both perturbation and test-retest conditions, non-robust under both, or robust under test-retest or perturbation conditions only.Using test-retest robustness as a reference, these conditions represent true positive, true negative, false negative and false positive cases, respectively.The direct comparison of robustness is presented in Figure 4.
No perturbation identified every feature that was non-robust under test-retest conditions.The number of false positives differed between perturbations and cohorts.Perturbation chains in the NSCLC cohort yielded less false positives than the HNSCC cohort on average (7.8% vs. 30.3%).

Discussion
We compared several methods for perturbing images to determine feature robustness.The chained perturbation that consists of noise addition, translation, volume adaptation and contour randomisation (NTVC) led to a low number of false positives in both cohorts, using test-retest robustness as reference.The TVC, RNVC and RVC perturbation chains showed similar performance.Hence any of these perturbation chains may be used to assess feature robustness.
Other perturbation methods performed poorly, particularly if only one kind of perturbation was used.This includes methods such as noise addition or simple rotations or translations.The combination of rotation and translation was not better than rotation or translation alone.Chaining methods that primarily alter the intensity content (noise, translation, rotation) with methods that update the region of interest mask (volume adaptation and contour randomisation) did improve results in terms of less false positives with regard to test-retest imaging.
We used test-retest imaging as a reference standard.However, test-retest imaging has its limitations.In particular, the number of test-retest images is usually just two, which may not suffice to determine the ICC with good precision 18 .This uncertainty is reflected in the 95% confidence interval of each ICC value.The average width of the 95% confidence interval of testretest ICCs was 0.12 (NSCLC) and 0.35 (HNSCC).Image perturbations can be repeated multiple times and thus allows a more precise estimation of the ICC.For instance, the average confidence interval width of the NTCV perturbation chain in the NSCLC cohort was similar to that of testretest imaging, with a width of 0.11 for both CT 1 and CT 2 ) images.However, the perturbation ICC in the HNSCC cohort could be determined considerably more precise with average confidence interval widths of 0.18 (CT 1 ) and 0.17 (CT 2 ).The large uncertainty in test-retest robustness for the HNSCC cohort may have contributed to a higher number of false positives.
Another limitation of using test-retest imaging as a reference is that the test-retest images may still be too similar.The same equipment and protocols may still be used, and segmentation may still be performed by a single expert.Thus it cannot be ruled out that some of the false negative features were correctly assessed as not robust by perturbation.
The above limitation may also explain the lower number of false negatives in the HNSCC cohort compared to the NSCLC cohort.Two different image acquisition protocols were used in the HNSCC cohort, whereas only one protocol was used for test-retest imaging in the NSCLC cohort.This is noticeable in the differences in exposure.The exposure between both HNSCC images differed by a factor 4 on average, whereas exposure in the NSCLC set was similar between images.The HNSCC test-retest set may thus have captured differences in exposure.However, the effect of exposure and tube current on feature robustness has been contested.Larue et al. and Mackin  et al. both found that exposure had a marginal effect on feature robustness 19,20 , whereas Midya et al. found that it had a more pronounced effect 21 .Test-retest robustness may also have been affected by the difference in reconstruction kernels.Though both kernels in the HNSCC cohort produce smooth images, different reconstruction kernels may strongly affect feature values 22,23 .
The current study has some limitations.One limitation is that we only assessed test-retest imaging based on computed tomography, as test-retest data sets for other modalities were not available to us.The proposed methodology should be assessed for different modalities, e.g.positron emission tomography (PET) and magnetic resonance imaging (MRI).
Another limitation is that we did not assess delineation uncertainties.Delineation uncertainties also cause variability in feature values 24 .Volume adaptation and contour randomisation perturbations try to induce this uncertainty, but a comparison against a multiple delineation data set should be performed in the future.
Perturbations allow us to perform repeated measurements and it is important to reckon how this may be used for radiomic modelling.We consider three methods for incorporating repeated measurements into radiomic modelling.The first, straightforward, method is to include only robust features in the modelling process.This method is currently used when robustness is determined using test-retest imaging and its implementation into modelling workflows should therefore be easy.Moreover, this method is useful when only a subset of the development cohort is perturbed, or a separate data set is used for robustness analysis.
The second way to use repeated measurements for radiomic modelling is by averaging the measurements of each feature.Averaging effectively increases feature robustness as the corresponding (panel/multiple rater) ICC is always higher than that of a single measurement 16 .The mean values of the features that are robust according to the panel ICC are then included into the modelling process.This method requires that all images in the development cohort are perturbed, and is thus more computationally expensive than the first.
The final method builds upon the second, and is conceptually close to the use of image perturbations for deep learning.Instead of averaging values and selecting robust features prior to modelling, all values are included in the model development process.One advantage of this method is that information concerning the distribution of feature values within and across samples is not lost, and may be exploited during the model development process.Another advantage is that a robustness threshold is not required.However, this method does require that all images in the development cohort are perturbed and may add complexity to radiomic modelling frameworks.A future study should compare these three methods and its effect on the performance of radiomic models.
In conclusion, we investigated the use of image perturbations to determine the robustness of radiomic features, using test-retest imaging as reference.Our findings indicate that chained perturbations which perturb image intensity and segmentation may be used instead of test-retest imaging to determine feature robustness.

Test-retest cohorts
Two patient cohorts with test-retest computed tomography imaging were used: a publicly available non-small cell lung cancer cohort of 31 patients 25,26 and an in-house cohort (DRKS 00006007) of 19 patients with locally advanced head and neck squamous cell carcinoma 27 .The NSCLC cohort is available from the Cancer Imaging Archive 28 .For the NSCLC cohort, two separate images were acquired within 15 minutes of each other, using the same scanner and acquisition protocol.Images in the HNSCC cohort were acquired within 4 days of each other using a different protocol, i.e. one CT image was acquired for 18 F-Fludeoxyglucose positron emission tomography (PET) attenuation correction, and the other for attenuation correction of 18 F-fluoromisonidazole PET.Approval for analysis of the in-house data set was provided by the local ethics committee (EK 177042017).Image acquisition parameters for both cohorts are shown in supplementary note 1.
The GTV was delineated by experienced radio-oncologists (L.A., K.P., E.G.C.T) using the Raystation 4.6 treatment planning system software (RaySearch Laboratories AB, Stockholm, Sweden), and subsequently used as the region of interest.

Image processing
Image processing was conducted using the scheme and recommendations provided by the Image Biomarker Standardisation Initiative (IBSI) 29 .An overview of the processing steps is provided in Figure 5, and further details may be found in the IBSI documentation.A complete overview of the image processing parameters, excluding perturbation-related parameters, may be found in Table 2.
In short, after loading a CT image, DICOM RTSTRUCT polygons were used to generate a voxelbased segmentation mask for the GTV ROI.The image and mask were then both rotated over a set angle θ (optional).Gaussian noise, based on the noise levels present in the original image, was added to the image (optional).Subsequently, both image and mask were translated with a sub-voxel shift η (optional) and interpolated with prior Gaussian anti-aliasing (supplementary note 2).After interpolation to isotropic voxel dimensions, the image intensity values were rounded to the nearest integer Hounsfield unit, and the mask was re-labeled based on the partial voxel volume threshold.The mask was then grown or shrunk to alter the volume by a fraction τ (optional), before being perturbed by supervoxel-based contour randomisation 30 (optional).The mask was subsequently copied to generate an intensity mask and a morphological mask.The intensity mask was resegmented to an intensity range which includes only soft-tissue voxels.Voxels with intensities deviating more than three standard deviations from the mean of the ROI were excluded from the intensity mask as well 31,32 .The image and both masks were subsequently used to compute radiomic features, with several feature families requiring additional discretisation (supplementary note 3).

Image perturbations
Five basic image perturbation methods were implemented in the image processing scheme described above.These were rotation (R), noise addition (N), translation (T), volume adaptation (V) and contour randomisation (C).Examples are shown in Figure 1.Rotation perturbs the image and mask by performing an affine transformation that rotates the image and mask in the axial (x,y) plane, i.e. around the z-axis, for a specified angle θ ∈ [−13  The algorithmic implementation of these perturbations is described in supplementary note 4.
Perturbations were chained using the settings documented in supplementary note 5.Each rotation angle and volume adaptation fraction led to generation of a new image.Noise addition and contour randomisation could be repeated multiple times, with each repetition producing a new perturbed image.The translation fraction was permuted over the different directions.For example, for translation fractions η = {0.25,0.5}, 2 3 = 8 permutations were generated.When chaining perturbations, all provided parameters were permuted.
An overview of the perturbation chains and the number of perturbed images generated is shown in Table 1.All perturbation chains produced between 27 and 40 perturbed images.

Features
All features defined in the IBSI documentation were implemented 29 , leading to a basic set of 182 features.These features were calculated at multiple scales, namely for isotropic voxel spacings of 1, 2, 3 and 4 mm 33 .118 features of the basic set required discretisation.Both fixed bin size and fixed bin width discretisation algorithms were used, each with four settings.Thus, 4032 features were computed in each image.Supplementary note 3 contains further details with regard to feature computation.
Both image processing and feature computation were conducted using our IBSI-compliant inhouse framework based on Python 3.6 34 .

Robustness analysis
Feature robustness was assessed using the intraclass correlation coefficient (1,1) (ICC) 16 , based on the assumption that test-retest images, as well as perturbations, possess no consistent bias.The highest possible ICC value is 1.00, which indicates that feature values are fully repeatable between test-retest images or perturbations.Lower values denote an increasing measurement variance with respect to the intra-patient variance, and thus lower repeatability.Image features with ICC ≥ 0.90 were considered to be robust 17 , and non-robust otherwise.
The test-retest ICC was determined between both CT images, see Figure 2. Perturbation ICCs were first computed separately for the test and retest images.Subsequently, perturbation ICCs were averaged over test and retest images to facilitate comparison with the test-retest ICC, as there was no consistent bias toward higher ICC values for one image set (see supplementary note 6).
Feature robustness was assessed using R 3.4.2 35and ICCs were computed using code adapted from the psych R-package 36 .

Supplementary note 1: image acquisition parameters
Computed tomography (CT) images were acquired for both the non-small-cell lung carcinoma (NSCLC) and head and neck squamous cell carcinoma (HNSCC) cohorts.For the NSCLC cohort, a second CT image was acquired 15 minutes after the first acquisition.The patient was asked to leave the table between the scans and was repositioned before the second image acquisition.
For the HNSCC cohort a second CT image was recorded to determine attenuation corrections for positron emission tomography (PET).This PET-CT scan was recorded within 4 days after the original diagnostic CT scan.Acquisition parameters and characteristics are shown in Table S1.Table S1 -Image acquisition parameters and characteristics for both NSCLC and HNSCC image data sets.Parameters were determined from the CT slices that contain portions of the gross tumour volume (GTV) region of interest.Numeric parameters are presented as median (range), unless only one value was found within the cohort.Image noise was calculated using Chang's method 37 and represented by its standard deviation σ (supplementary note 4).kVp: peak kilovoltage; HU: Hounsfield unit Supplementary note 2: pre-interpolation low-pass filtering Image features are computed from voxels with uniform dimensions.In this work, features are computed with voxel spacings of 1, 2, 3 and 4 mm.The in-plane original spacing of the CT images is between 0.51 and 1.37 mm.We therefore need to down-sample images, which may cause image artefacts through aliasing and thus reduce feature robustness.In signal analysis, a signal may contain only frequencies up to half the sample frequency (the Nyquist frequency ω N ) of the down-sampled signal to avoid artefacts.Signals are therefore low-pass filtered before down-sampling to suppress high frequency contents.The same concept applies to images as well.However, application of low-pass filters in radiomics is often neglected, despite the beneficial effect on feature robustness 20 .
We use a low-pass Gaussian filter before interpolation scipy.ndimage.gaussianfilter.The Gaussian function g(x) is defined as: with σ the standard deviation, or width, of the distribution.σ is an input parameter for the Gaussian filter for which optimal settings have not been established.σ moreover needs to be defined with respect to the typically non-uniformly spaced coordinate grid system of the original image and is thus specified separately for each axis.
Fourier theory allows us to set σ based on the Nyquist frequency.The Fourier transform of the Gaussian function g is 38 : with ω being a frequency.An ideal low-pass filter will maintain all frequencies ω < ω N , and remove frequencies ω ≥ ω N completely.However, ideal filters do not exist and a compromise is required between the desired attenuation of high-frequency content and the unwanted attenuation of low-frequency content.We define a smoothing parameter β, with 0 < β ≤ 1, for the Fourier transformed Gaussian at ω = ω N : The Nyquist frequency ω N may be expressed in terms of voxel spacing.For instance, we have a one-dimensional array of voxels with spacing d 1 .We want to sample this array to spacing d 2 .The sampling frequency is then We now solve equation ( 1) for σ: We assess different parameter settings for β, namely β = {0.50,0.70, 0.80, 0.85, 0.90, 0.93, 0.95, 0.97}, as well as no low-pass filtering.Test-retest intraclass correlation coefficients (ICC (1,1)) and their 95% confidence intervals (CI) are calculated on both test-retest cohorts 16 .The ICCs are used to determine the number of robust features and to show the ICC distribution.In addition, the distribution of the width of the ICC 95% confidence intervals is assessed.
Example images of an interpolated slice acquired from an NSCLC and an HNSCC patient are shown in Figures S1 and S2, respectively.Down-sampling without interpolation caused visible image artefacts.On the other hand, images that are smoothed with a wide Gaussian low-pass filter (low β value) lack detail.
The percentage of robust features according to the test-retest ICC is shown in Figure S3.For the NSCLC cohort, even very light smoothing (β = 0.97) increases the percentage of robust features from 59.0% to 75.9%.With lower β-values, this percentage does not change, nor does the distribution of ICCs (Figure S4) or the distribution of ICC CI widths (Figure S5).For very low β-values, the ICC distribution for NSCLC may be less stable.
For the HNSCC cohort, the percentage of robust features increases with decreasing β, which is also reflected in the ICC distribution.In particular, even very mild smoothing (β = 0.97) increased the median ICC from 0.63 to 0.76.When only features computed with minimal down-sampling are considered (1 mm), β = 0.97 reduced the median ICC from 0.72 to 0.65, and only recovered at β = 0.93.The same may be observed for the ICC CI width, which was increased for β = 0.97.A smoothing parameter value between β = 0.93 (robust features: 34.0%; median ICC: 0.85; median CI width: 0.29) and β = 0.90 (robust features: 43.0%; median ICC: 0.88; median CI width: 0.23) offers a good compromise between aliasing and lack of image details.

Noise
Noise affects voxel intensities.Reproducible features should be robust to the noise present in an image.Perturbation by noise addition therefore follows two steps.First, the noise-dependent intensity variance is determined.Secondly, noise drawn from a Gaussian distribution with the same variance is added to the image.
The method of Chang et al. 37,45 is used to determine noise variance.In short, the image I is filtered in both the x and y direction in the image plane (z being the axis along which the image slices are stacked) using a one-dimensional stationary coiflet-1 wavelet high-pass filter, pywt.Wavelet("coif1").dechi.The filter convolution was implemented using the scipy.ndimage.convolve1dfunction.This cascade filter operation yields I diff .Subsequently, the noise level is estimated as: Subsequently, for every image voxel random noise from a normal (Gaussian) distribution with mean 0 and standard deviation σ = σ noise is generated (numpy.random.normal),and added.After noise addition, intensities are rounded to the nearest integer value to conform with the expected integer Hounsfield units in CT.
Noise variance is determined on the original image data, before any rotation, translation or other operation occurs.In the image processing scheme, noise addition takes place after rotation of the image, if applicable.

Translation
Translation, like rotation, emulates changes due to different patient positioning.Translation was performed concurrently with interpolation, i.e. the interpolation grid was shifted off-centre by the provided translation fraction η multiplied by the interpolation grid spacing.Translation was conducted along the x, y and z axes.Translation and interpolation was conducted with tri-linear approximation using the scipy.ndimage.mapcoordinates function.

Volume adaptation
Shrinking or growing the segmentation mask is a method to mimic variance in expert delineations.For example, Fotina et al. reported a mean coefficient of variance in volume of 14.9% (range: [4.4,29.3]%) in CT-based expert delineations for lung and prostate cancer.The proposed method for volume adaptation is simple and intensity-agnostic, and is conducted as follows: 1. Approximate the volume V 0 of the ROI R 0 by counting the number of voxels in the mask.
2. Calculate the volume of the ROI after adaptation (rounded down towards the nearest integer) by V a = V 0 (1 + τ ) , with τ the required growth/shrinkage fraction.τ > 0.0 indicates volume growth, and τ < 0.0 indicates shrinkage.
3. Define a geometric structure element that includes all voxels within Manhattan distance 1 (i.e. a centre voxel and its directly adjacent neighbours).We used the scipy.ndimage.generatebinary structure(3 function.
4. Initialise a place-holder for an adapted mask R p with volume V p by copying the original ROI and its volume.This place-holder is used to track the volume and mask over iterative adaptations.
5. Iterate the mask shrinkage/growth process until the loop breaks: (a) If τ > 0.0 dilate the mask (scipy.ndimage.binarydilation) once, using the structure element defined in step 3.
(c) Approximate the volume V n of the newly adapted mask R n by counting the number of voxels in the mask.
(d) If V n = 0.0 break from the loop.
(e) If τ > 0.0 and V n > V a break from the loop.
(f) If τ < 0.0 and V n < V a break from the loop.
(g) Replace the previous place-holder mask by setting R p = R n .This is done until the final growth/shrinkage iteration, when one of the conditions in steps d-f was satisfied.
6.If V n = V a , R n contains either too many (τ > 0.0) or too few (τ < 0.0) voxels.A limited number of voxels should be added to or removed from the mask R p to complete the adaptation.Practically, we update the rim formed by the disjunctive union of R p and R n , i.e.R r = R n R p : (a) Determine the number of voxels to be added/removed from the mask: (c) Select N voxels from the rim at random, without replacement (numpy.random.choice).
(d) If N > 0 and τ > 0.0 add the N voxels to mask R p .
(e) If N > 0 and τ < 0.0 remove the N voxels from mask R p .
7. Volume adaptation ends.The mask R p defines the perturbed region of interest.

Contour randomisation
Multiple image segmentations are required for randomising the contour of the region of interest.
Creating multiple segmentations usually requires delineation by multiple experts.However, for larger quantities of image data, the creation of multiple manual delineations is extremely timeconsuming and unfeasible in practice.An automated contour randomisation is therefore required.We use supervoxel-based segmentation algorithm for randomising contours.Supervoxels are connected clusters of voxels with similar intensity characteristics.To create a random contour, we compare supervoxels with a single segmentation delineated by an expert.The region of interest (ROI) is then randomised based on the overlap of supervoxels with the expert contour.Multiple algorithms produce supervoxels.We used the simple linear iterative clustering (SLIC) algorithm as it efficiently produces compact, contiguous supervoxels 30 .This algorithm was provided through the skimage.segmentation.slicsuperpixels function.
Contour randomisation is conducted as follows: 1.Both the image and the region of interest (ROI) mask are cropped to 25 mm around the ROI bounding box to limit computational costs.
, 13 •  ].Noise addition perturbs image intensities by adding random noise that was drawn from a normal distribution with mean 0 and a standard deviation equal to the estimated standard deviation of the noise present in the image.Translation perturbs the image and mask by performing an affine transformation that shifts the image and mask for specified fractions (η ∈ [0.25, 0.75]) of the isotropic voxel spacing along the x, y and z axis.Volume adaptation grows and/or shrinks the mask by a specified fraction τ ∈ [−0.28, 0.28].Contour randomisation is based on simple linear iterative clustering 30 , and perturbs the mask by randomly selecting supervoxels based on the overlap with the original mask.

Figure S1 -Figure S2 -Figure S3 -Figure S4 -
Figure S1 -Effect of smoothing and interpolation on a CT slice of an NSCLC patient.A Gaussian smoothing filter for the given β-values was applied before interpolation.Afterwards, tri-linear interpolation was conducted to resample to uniform voxel spacing (in mm).All slices are shown at the same size for comparison, and intensities were windowed between [−400, 300] HU.

Figure S5 -
Figure S5 -Distribution of the 95 % confidence interval (CI) widths of the test-retest intraclass correlation coefficients (ICC (1,1)) for a pre-interpolation Gaussian smoothing parameter β.Higher CI widths indicate larger variance in feature values between test and retest images.Lower β-values indicate stronger smoothing.The areas of the distributions were normalised.The median 95% CI width is indicated in each distribution by a horizontal line.ICC CI width distributions are shown for all features (a) and for features acquired using a uniform spacing of 1 mm (b).
b) Find rim R r by logical XOR comparison of R n and R p (numpy.logical xor).

Table 1 -
List of perturbations, with their abbreviation and the number of different images generated by each perturbation.The settings used by each perturbation chain are listed in supplementary note 5 Fraction of robust features identified for test-retest (orange) and perturbations (blue).Robustness was assessed using the intraclass correlation coefficient (ICC).Features with ICC ≥ 0.90 were considered to be robust.Perturbations are abbreviated, see Table1: R: rotation; N: noise addition; T: translation; V: volume adaptation; C: contour randomisation.

Table 2 -
Image processing parameters for both NSCLC and HNSCC data sets.The isotropic voxel spacing is defined in three dimensions, i.e. a spacing of 2 mm corresponds to a voxel dimension of 2 × 2 × 2 mm.Discretisation was performed using two methods (fixed bin number and fixed bin size) with varying bin sizes.ROI : region of interest; HU: Hounsfield unit; σ: standard deviation of voxel intensities within the region of interest.

Table S2 -
Feature families and the number of computed features.Several features require discretisation prior to computation.As two discretisation methods with each four bin size settings were evaluated, the number of such features is multiplied by 8.The final amount of features is 4032, due to calculation of each feature for four different interpolation spacings.